gaste_test.stratified_table2x2 module
- class gaste_test.stratified_table2x2.StratifiedTable2x2(tables: List[Tuple[Tuple[int, int], Tuple[int, int]]], labels: List[str] | None = None, decimal: int | None = 3, alpha: str | None = 0.05, limit_computation_exact: int | None = 10000000, name_rows: Tuple[str, str] | None = None, name_columns: Tuple[str, str] | None = None)[source]
Bases:
objectThis module contains the StratifiedTable2x2 class for analyzing stratified 2x2 contingency tables.
StratifiedTable2x2takes in a list of tables, labels, and optional parameters to perform various statistical tests and calculations on the tables.Parameters:
tables (list) : A list of ndarray 2x2 representing the contingency tables. One table per stratum. labels (list) : A list of labels for each table/stratum. decimal (int) : The number of decimal places to round the results to. Default is 3. alpha (float) : The significance level for confidence intervals and hypothesis tests. Default is 0.05. limit_computation_exact (int) : The limit for exact computation of the combined p-value. Default is 10^7. name_rows (tuple) : A tuple of row names for the tables. Optional. name_columns (tuple) : A tuple of column names for the tables. Optional.
Attributes:
nb_combination (float) : The number of combinations for the exact calculation, int cast to float for numeric reason. odds_ratio (ndarray) : An array of odds ratios for each table. log_odds_ratio (ndarray) : An array of log odds ratios for each table. ci_odds_ratio_inf (ndarray) : An array of lower confidence intervals for odds ratios. ci_odds_ratio_sup (ndarray) : An array of upper confidence intervals for odds ratios. log_ci_odds_ratio_inf (ndarray) : An array of lower confidence intervals for log odds ratios. log_ci_odds_ratio_sup (ndarray) : An array of upper confidence intervals for log odds ratios. pval_under (ndarray) : An array of p-values for the hypothesis test of odds ratio < 1. pval_over (ndarray) : An array of p-values for the hypothesis test of odds ratio > 1. weight (ndarray) : An array of weights for each table. odds_ratio_pooled (float) : The pooled odds ratio. df (DataFrame) : A pandas DataFrame containing the results of the analysis and data.
Methods:
- __init__(tables, labels, decimal=3, alpha=0.05, name_rows=None, name_columns=None):
Initializes a StratifiedTable2x2 object with the given parameters.
- gaste(alternative=’less’, tau=1, limit_computation_exact=10**7, verbose=True, moment=2, jobs=None):
Performs the GASTE test on the tables.
- pool_odd_ratio():
Calculates the pooled odds ratio.
- pool_ci_odd_ratio():
Calculates the confidence interval for the pooled odds ratio.
- CMH_test(correction=False):
Performs the Cochran-Mantel-Haenszel test on the tables.
- BD_test(adjust=False):
Performs the Breslow-Day test for homogeneity of odds ratios.
- resume():
Prints a summary of the analysis results.
- plot(log_scale=True, fontsize=12, thresh_adjust=0.03, y_figsize=None, save=None):
Plots a forest plot with odds ratios, confidence intervals and resume of data.
- BD_test(adjust=False)[source]
Perform the Breslow-Day test for homogeneity of odds ratios, i.e. test that all odds ratio are equal.
Parameters:
- adjustbool, optional
Use the Tarone adjustment to achieve the chi^2 asymptotic distribution.
Returns:
- resultResult
- A named tuple containing attributes :
- statisticfloat
The chi^2 test statistic.
- p-valuefloat
The p-value for the test.
Notes:
The implementation is inspired by the implementation in the statsmodels package.
- CMH_test(correction=False) NamedTuple[source]
Perform the Cochran-Mantel-Haenszel (CMH) test on a 2x2 stratified table to test the overall association between features and outcomes in 2x2 stratified table.
Parameters:
- correctionbool, optional
Parameter to apply Yates correction for continuity. Default is False.
Returns:
- resultResult
- A named tuple containing attributes :
- statfloat
The CMH test statistic
- pvaluefloat
The p-value associated with the CMH test
References:
Cochran, W. G. (1954). Some methods for strengthening the common chi-squared tests. Biometrics, 10(4), 417-451.
Mantel, N., & Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 22(4), 719-748.
- gaste(alternative: str | None = 'less', tau=1, limit_computation_exact=10000000, verbose=True, moment=2, jobs=None) NamedTuple[source]
Compute the GASTE (Gamma Approximation of Stratified Troncated Exact) test to test the overall association between features and outcome in 2x2 stratified table.
- Parameters:
alternative ({'less', 'greater'}, optional) –
The alternative hypothesis. Default is “less”.
’less’: compute the combined p-value of one-sided less Fisher exact test of each stratum to test overall under-association.
’greater’: compute the combined p-value of one-sided greater Fisher exact test of each stratum to test overall over-association.
See the Notes for more details.
tau (int or float, optional) – The truncation value used in GASTE. Default is 1.
limit_computation_exact (int, optional) – The limit for exact computation of the combined p-value. Default is 10^7.
verbose (bool, optional) – Whether to print verbose output during computation. Default is True.
moment (int, optional) – The moment to use for the approximation of the combined p-value. Default is 2.
jobs (int or None, optional) – The number of parallel jobs to use for computation. Default is None, all core is used.
- Returns:
result –
- A named tuple containing attributes :
- statfloat
It’s the value of the combination of p-value of observed data in each strat.
- pvaluefloat
The combined p-value resulting from the GASTE test.
- Return type:
Result
- Raises:
ValueError – If the alternative parameter is not “less” or “greater”.
See also
combined_pval.get_pval_combThe function that computes the combined p-value.
Notes:
The GASTE statistic is computed as \(Y_{\tau} = -2\sum_{i=1}^I \left(\log(P_s) - \log(\tau)\right)\mathbb{I}(P_s\leq\tau)\) or each p-value in the given data.
\[Y_{\tau} = -2\sum_{i=1}^I \left( \log(P_s) - \log(\tau)\right)\mathbb{I}(P_s\leq\tau)\]The combined p-value is computed exactly by exploring all possible combination of tables if the number of combination is under the limit threshold, else gamma approximation is used.
If alternative is less, the combined p-value is stored in self.combined_pval_less. If alternative is greater, the combined p-value is stored in self.combined_pval_greater. So if the result is needed later in plot method or other, it can be used without recomputing it.
Globaly this method call the function
combined_pval.get_pval_comb(). See documentation for more details
- plot(log_scale=True, fontsize=12, thresh_adjust=0.03, y_figsize=None, save: str | None = None)[source]
Plot a forest plot with odds ratios, confidence intervals and resume of data on each side of the CI odd ratio plot. The plot is annotated with the CMH test, BD test, and GASTE test results.
Parameters:
- log_scalebool, optional
Whether to use a logarithmic scale for the x-axis, so use odd ratio or log odd ratio. Default is True.
- fontsizeint, optional
The font size for the plot. Default is 12.
- thresh_adjustfloat, optional
The adjustment value for the figure on y-axis to align confident interval result with data on each side. If you have only 2 or 3 strata, a value of 0.1 is advise. Else, if you have more than 10 strata, a smaller value like 0.001 is advise. Default is 0.03.
- y_figsizefloat, optional
The figure size for the y-axis. Default is None and set automatically based on the number of strata.
- savestr, optional
The file path to save the plot, a png and svg file will be create. Default is None.
Notes:
At the end show is not called, so you can use plt.show() to display the plot. But there is an issue with the x-axis size of the display due to the use of annotation to show some information on each side of the CI odd ratio plot. So the option save is recommended to save and display the plot, or the use of Jupiter notebook avoid this issue.
Returns:
None
- pool_ci_odd_ratio()[source]
Calculate the confidence interval for the pooled odds ratio.
This method calculates the confidence interval for the pooled odds ratio using the Generalized Mantel-Haenszel estimators for K 2xJ tables.
Returns:
tuple: A tuple containing the lower and upper bounds of the confidence interval.
References:
Greenland S (1989) Generalized Mantel-Haenszel estimators for K 2xJ tables. Biometrics 45(1):183-191
- class gaste_test.stratified_table2x2.Table2x2(table_data: Tuple[Tuple[int, int], Tuple[int, int]] | Tuple[int, int, int, int])[source]
Bases:
objectRepresents a 2x2 contingency table and provides various statistical calculations.
Parameters:
table_data : 2D array-like representing the content of the table [[a,b],[c,d]] where a is the count of events in the first category, b is the count of non-events in the first category, c is the count of events in the second category, d is the count of non-events in the second category. Or a tuple of 4 integers (N, n, K, a) where N is the total count of events and non-events, n is the count of events in both categories, K is the total count in the first category, and a is the count of events in the first category.
Attributes:
- aint
The count of events in the first category.
- bint
The count of non-events in the first category.
- cint
The count of events in the second category.
- dint
The count of non-events in the second category.
- Nint
The total count of events and non-events (a+b+c+d).
- nint
The count of events in both categories (a+c).
- Kint
The total count in the first category (a+b).
Methods:
- odd_ratio()
Calculates the odds ratio of the contingency table.
- support()
Returns the range of possible support values.
- len_support()
Returns the number of possible support values.
- variance_log_odd_ratio()
Calculates the variance of the log odds ratio.
- ci_odd_ratio(alpha=0.05)
Calculates the confidence interval of the odds ratio.
- mh_weight()
Calculates the Mantel-Haenszel weight.
- pval_under()
Calculates the p-value for observing the given count or fewer.
- pval_over()
Calculates the p-value for observing the given count or more.
- support_pval_under()
Calculates the p-values for observing each support value or fewer.
- support_pval_over()
Calculates the p-values for observing each support value or more.
- ci_odd_ratio(alpha=0.05)[source]
Calculates the confidence interval of the odds ratio.
Parameters:
- alphafloat, optional
The significance level (default is 0.05).
Returns:
- tuple
A tuple containing the lower and upper bounds of the confidence interval.
- len_support() float[source]
Returns the number of possible support values.
Returns:
- float
The number of possible support values.
Notes:
The type of the int return value is cast to float to avoid rounding errors during the calculation of the number of combination in Stratified2x2.
- mh_weight()[source]
Calculates the Mantel-Haenszel weight.
Returns:
- float
The Mantel-Haenszel weight.
- odd_ratio()[source]
Calculates the odds ratio of the contingency table.
Returns:
- float
The odds ratio.
- pval_over()[source]
Calculates the p-value for observing the given count or more.
Returns:
- float
The p-value.
- pval_under()[source]
Calculates the p-value for observing the given count or fewer.
Returns:
- float
The p-value.
- support() range[source]
Returns the range of possible support values.
Returns:
- range
The range of possible support values.
- support_pval_over()[source]
Calculates the p-values for observing each support value or more.
Returns:
- ndarray
An array of p-values.