gaste_test.stratified_table2x2 module

class gaste_test.stratified_table2x2.StratifiedTable2x2(tables: List[Tuple[Tuple[int, int], Tuple[int, int]]], labels: List[str] | None = None, decimal: int | None = 3, alpha: str | None = 0.05, limit_computation_exact: int | None = 10000000, name_rows: Tuple[str, str] | None = None, name_columns: Tuple[str, str] | None = None)[source]

Bases: object

This module contains the StratifiedTable2x2 class for analyzing stratified 2x2 contingency tables.

StratifiedTable2x2 takes in a list of tables, labels, and optional parameters to perform various statistical tests and calculations on the tables.

Parameters:

tables (list) : A list of ndarray 2x2 representing the contingency tables. One table per stratum. labels (list) : A list of labels for each table/stratum. decimal (int) : The number of decimal places to round the results to. Default is 3. alpha (float) : The significance level for confidence intervals and hypothesis tests. Default is 0.05. limit_computation_exact (int) : The limit for exact computation of the combined p-value. Default is 10^7. name_rows (tuple) : A tuple of row names for the tables. Optional. name_columns (tuple) : A tuple of column names for the tables. Optional.

Attributes:

nb_combination (float) : The number of combinations for the exact calculation, int cast to float for numeric reason. odds_ratio (ndarray) : An array of odds ratios for each table. log_odds_ratio (ndarray) : An array of log odds ratios for each table. ci_odds_ratio_inf (ndarray) : An array of lower confidence intervals for odds ratios. ci_odds_ratio_sup (ndarray) : An array of upper confidence intervals for odds ratios. log_ci_odds_ratio_inf (ndarray) : An array of lower confidence intervals for log odds ratios. log_ci_odds_ratio_sup (ndarray) : An array of upper confidence intervals for log odds ratios. pval_under (ndarray) : An array of p-values for the hypothesis test of odds ratio < 1. pval_over (ndarray) : An array of p-values for the hypothesis test of odds ratio > 1. weight (ndarray) : An array of weights for each table. odds_ratio_pooled (float) : The pooled odds ratio. df (DataFrame) : A pandas DataFrame containing the results of the analysis and data.

Methods:

__init__(tables, labels, decimal=3, alpha=0.05, name_rows=None, name_columns=None):
Initializes a StratifiedTable2x2 object with the given parameters.

gaste(alternative=’less’, tau=1, limit_computation_exact=10**7, verbose=True, moment=2, jobs=None):
Performs the GASTE test on the tables.

pool_odd_ratio():
Calculates the pooled odds ratio.

pool_ci_odd_ratio():
Calculates the confidence interval for the pooled odds ratio.

CMH_test(correction=False):
Performs the Cochran-Mantel-Haenszel test on the tables.

BD_test(adjust=False):
Performs the Breslow-Day test for homogeneity of odds ratios.

resume():
Prints a summary of the analysis results.

plot(log_scale=True, fontsize=12, thresh_adjust=0.03, y_figsize=None, save=None):
Plots a forest plot with odds ratios, confidence intervals and resume of data.

BD_test(adjust=False)[source]

Perform the Breslow-Day test for homogeneity of odds ratios, i.e. test that all odds ratio are equal.

Parameters:

adjustbool, optional: Use the Tarone adjustment to achieve the chi^2 asymptotic distribution.

Returns:

resultResult

A named tuple containing attributes :

statisticfloat: The chi^2 test statistic.
p-valuefloat: The p-value for the test.

Notes:

The implementation is inspired by the implementation in the statsmodels package.

CMH_test(correction=False) → NamedTuple[source]

Perform the Cochran-Mantel-Haenszel (CMH) test on a 2x2 stratified table to test the overall association between features and outcomes in 2x2 stratified table.

Parameters:

correctionbool, optional: Parameter to apply Yates correction for continuity. Default is False.

Returns:

resultResult

A named tuple containing attributes :

statfloat: The CMH test statistic
pvaluefloat: The p-value associated with the CMH test

References:

Cochran, W. G. (1954). Some methods for strengthening the common chi-squared tests. Biometrics, 10(4), 417-451.

Mantel, N., & Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 22(4), 719-748.

gaste(alternative: str | None = 'less', tau=1, limit_computation_exact=10000000, verbose=True, moment=2, jobs=None) → NamedTuple[source]

Compute the GASTE (Gamma Approximation of Stratified Troncated Exact) test to test the overall association between features and outcome in 2x2 stratified table.

Parameters:

alternative ({'less', 'greater'}, optional) –
The alternative hypothesis. Default is “less”.
- ’less’: compute the combined p-value of one-sided less Fisher exact test of each stratum to test overall under-association.
- ’greater’: compute the combined p-value of one-sided greater Fisher exact test of each stratum to test overall over-association.
See the Notes for more details.
tau (int or float, optional) – The truncation value used in GASTE. Default is 1.
limit_computation_exact (int, optional) – The limit for exact computation of the combined p-value. Default is 10^7.
verbose (bool, optional) – Whether to print verbose output during computation. Default is True.
moment (int, optional) – The moment to use for the approximation of the combined p-value. Default is 2.
jobs (int or None, optional) – The number of parallel jobs to use for computation. Default is None, all core is used.

Returns:

result –

A named tuple containing attributes :

statfloat: It’s the value of the combination of p-value of observed data in each strat.
pvaluefloat: The combined p-value resulting from the GASTE test.

Return type:

Result

Raises:

ValueError – If the alternative parameter is not “less” or “greater”.

Notes:

The GASTE statistic is computed as \(Y_{\tau} = -2\sum_{i=1}^I \left(\log(P_s) - \log(\tau)\right)\mathbb{I}(P_s\leq\tau)\) or each p-value in the given data.

\[Y_{\tau} = -2\sum_{i=1}^I \left( \log(P_s) - \log(\tau)\right)\mathbb{I}(P_s\leq\tau)\]

The combined p-value is computed exactly by exploring all possible combination of tables if the number of combination is under the limit threshold, else gamma approximation is used.

If alternative is less, the combined p-value is stored in self.combined_pval_less. If alternative is greater, the combined p-value is stored in self.combined_pval_greater. So if the result is needed later in plot method or other, it can be used without recomputing it.

Globaly this method call the function combined_pval.get_pval_comb(). See documentation for more details

plot(log_scale=True, fontsize=12, thresh_adjust=0.03, y_figsize=None, save: str | None = None)[source]

Plot a forest plot with odds ratios, confidence intervals and resume of data on each side of the CI odd ratio plot. The plot is annotated with the CMH test, BD test, and GASTE test results.

Parameters:

log_scalebool, optional: Whether to use a logarithmic scale for the x-axis, so use odd ratio or log odd ratio. Default is True.
fontsizeint, optional: The font size for the plot. Default is 12.
thresh_adjustfloat, optional: The adjustment value for the figure on y-axis to align confident interval result with data on each side. If you have only 2 or 3 strata, a value of 0.1 is advise. Else, if you have more than 10 strata, a smaller value like 0.001 is advise. Default is 0.03.
y_figsizefloat, optional: The figure size for the y-axis. Default is None and set automatically based on the number of strata.
savestr, optional: The file path to save the plot, a png and svg file will be create. Default is None.

Notes:

At the end show is not called, so you can use plt.show() to display the plot. But there is an issue with the x-axis size of the display due to the use of annotation to show some information on each side of the CI odd ratio plot. So the option save is recommended to save and display the plot, or the use of Jupiter notebook avoid this issue.

Returns:

None

pool_ci_odd_ratio()[source]

Calculate the confidence interval for the pooled odds ratio.

This method calculates the confidence interval for the pooled odds ratio using the Generalized Mantel-Haenszel estimators for K 2xJ tables.

Returns:

tuple: A tuple containing the lower and upper bounds of the confidence interval.

References:

Greenland S (1989) Generalized Mantel-Haenszel estimators for K 2xJ tables. Biometrics 45(1):183-191

pool_odd_ratio()[source]: Calculate the pooled odds ratio.

Returns:

float: The pooled odds ratio.

resume()[source]: Print the DataFrame containing the data, the odd ration and ci of each stratum, pooled odd ratio with MH method, and the confident interval of the pooled odd ratio.

Returns:

None

class gaste_test.stratified_table2x2.Table2x2(table_data: Tuple[Tuple[int, int], Tuple[int, int]] | Tuple[int, int, int, int])[source]

Bases: object

Represents a 2x2 contingency table and provides various statistical calculations.

Parameters:

table_data : 2D array-like representing the content of the table [[a,b],[c,d]] where a is the count of events in the first category, b is the count of non-events in the first category, c is the count of events in the second category, d is the count of non-events in the second category. Or a tuple of 4 integers (N, n, K, a) where N is the total count of events and non-events, n is the count of events in both categories, K is the total count in the first category, and a is the count of events in the first category.

Attributes:

aint: The count of events in the first category.
bint: The count of non-events in the first category.
cint: The count of events in the second category.
dint: The count of non-events in the second category.
Nint: The total count of events and non-events (a+b+c+d).
nint: The count of events in both categories (a+c).
Kint: The total count in the first category (a+b).

Methods:

odd_ratio(): Calculates the odds ratio of the contingency table.
support(): Returns the range of possible support values.
len_support(): Returns the number of possible support values.
variance_log_odd_ratio(): Calculates the variance of the log odds ratio.
ci_odd_ratio(alpha=0.05): Calculates the confidence interval of the odds ratio.
mh_weight(): Calculates the Mantel-Haenszel weight.
pval_under(): Calculates the p-value for observing the given count or fewer.
pval_over(): Calculates the p-value for observing the given count or more.
support_pval_under(): Calculates the p-values for observing each support value or fewer.
support_pval_over(): Calculates the p-values for observing each support value or more.

ci_odd_ratio(alpha=0.05)[source]

Calculates the confidence interval of the odds ratio.

Parameters:

alphafloat, optional: The significance level (default is 0.05).

Returns:

tuple: A tuple containing the lower and upper bounds of the confidence interval.

len_support() → float[source]

Returns the number of possible support values.

Returns:

float: The number of possible support values.

Notes:

The type of the int return value is cast to float to avoid rounding errors during the calculation of the number of combination in Stratified2x2.

mh_weight()[source]

Calculates the Mantel-Haenszel weight.

Returns:

float: The Mantel-Haenszel weight.

odd_ratio()[source]

Calculates the odds ratio of the contingency table.

Returns:

float: The odds ratio.

pval_over()[source]

Calculates the p-value for observing the given count or more.

Returns:

float: The p-value.

pval_under()[source]

Calculates the p-value for observing the given count or fewer.

Returns:

float: The p-value.

support() → range[source]

Returns the range of possible support values.

Returns:

range: The range of possible support values.

support_pval_over()[source]

Calculates the p-values for observing each support value or more.

Returns:

ndarray: An array of p-values.

support_pval_under()[source]

Calculates the p-values for observing each support value or fewer.

Returns:

ndarray: An array of p-values.

variance_log_odd_ratio()[source]

Calculates the variance of the log odds ratio.

Returns:

float: The variance of the log odds ratio.