gaste_test.combined_pval module

gaste_test.combined_pval.explicite_combination(list_params: List[List[int]], list_pvals: List[float], type: str, tau=None, jobs=None, verbose=False, distribution=False, all_value=False)[source]

Compute explicit law of combination of p-values.

Parameters:
  • list_params (List[List[int]]) – A list of parameter lists. Each parameter list contains three integers [N, K, n] indicating the data in each stratum (length of List = number of stratum). N is the population size, K is the number of positive outcome in both categories (with and without the feature), and n is the number of both outcomes (positive and negative) with the feature. These three integers are also the parameters of the hypergeometric distribution of each stratum that describes the probability of observing the data

  • list_pvals (List[float]) – A list of p-values of each stratum. These p-values are the result of Fisher’s exact test for each stratum.

  • type (str) – The type of combination. Can be either “under” or “over”.

  • tau (float, optional) – The threshold value for combining p-values. Defaults to None.

  • jobs (int, optional) – The number of parallel jobs to run. Defaults to None.

  • verbose (bool, optional) – Whether to print verbose output. Defaults to False.

  • distribution (bool, optional) – Whether to return the distribution of combined p-values. Defaults to False.

  • all_value (bool, optional) – Whether to return all values of combined p-values. Defaults to False.

Returns:

If distribution is True, returns a tuple of array describing exact low of combination through the random variable and survival value (y, P(Y>y)). If all_value is True, rounded strategy is not used. Else, returns the combined p-value.

Return type:

float or tuple

Notes

This function computes the explicit combination of p-values based on the given parameters. The combination can be either “under” or “over” depending on the type parameter. The threshold value tau is used to truncate the combination of p-values. The jobs parameter controls the number of parallel jobs to run for faster computation. Setting verbose to True will print a progress bar of the computation. Setting distribution to True will return the distribution of combined p-values. Setting all_value to True will return all values of combined p-values without rounded strategy.

Examples

>>> list_params = [[20, 5, 10], [25, 8, 15], [18, 12, 6]]
>>> list_pvals = [0.15170278637770898, 0.12810484709798212, 0.29427925016160306]
>>> type = "under"
>>> explicite_combinaison(list_params, list_pvals, type)
0.028607013097391977
gaste_test.combined_pval.get_pval_comb(data_params, data_pvals, type, tau=0.2, threshold_compute_explicite=50000000, moment=4, jobs=None, verbose=False, distribution=False, all_value=False)[source]

Compute the combined p-value.

Parameters:
  • data_pvals (list) – List of p-values.

  • data_params (list) – List of parameters.

  • type (str) – Type of combination method.

  • tau (float, optional) – Threshold value for p-values. Default is None.

  • threshold_compute_explicite (int, optional) – Threshold for using explicit calculation. Default is 10**7.

  • moment (int, optional) – Moment for moment matching estimator. Default is 4.

  • jobs (int, optional) – Number of parallel jobs. Default is None.

  • verbose (bool, optional) – Whether to print verbose output. Default is False.

  • distribution (bool, optional) – Whether to compute the distribution of the combined p-value. Default is False.

  • all_value (bool, optional) – Whether to return all intermediate values. Default is False.

Returns:

The combined p-value.

Return type:

float

Notes

This function computes the combined p-value based on the given p-values and parameters. It uses either explicit calculation or moment matching estimator depending on the support size of the combined p-value. If the support size is below the threshold_compute_explicite, explicit calculation is used. Otherwise, moment matching estimator is used.

gaste_test.combined_pval.moment_matching_estimator(params, type, list_pvals=None, comb_pvals=None, tau=None, moment=2, get_params=False, get_moment=False)[source]

Estimate the parameters of a gamma distribution using the method of moments.

Parameters:
  • params (list of tuples) – List of tuples containing the parameters of the hypergeometric distribution for each stratum. Each tuple should contain three values: (N, K, n), where N is the population size, K is the number of successes in the population, and n is the sample size.

  • type (str) –

    Type of estimation to perform. Can be either “under” or “over”.

    • ”under”: Estimate the parameters for the lower tail of the distribution.

    • ”over”: Estimate the parameters for the upper tail of the distribution.

  • list_pvals (array-like, optional) – Array-like object containing the p-values to combine for each stratum. Either list_pvals or comb_pvals must be provided, but not both.

  • comb_pvals (array-like or float, optional) – Float or array-like object containing the value of combination of p-values. Either list_pvals or comb_pvals must be provided, but not both.

  • tau (float, optional) – Threshold value for the p-values. Defaults to None. If not provided, a default value of 1 will be used.

  • moment (int, optional) – Moment to estimate. Can be 2, 3, or 4. Defaults to 2.

  • get_params (bool, optional) – Flag indicating whether to return the estimated parameters. Defaults to False.

  • get_moment (bool, optional) – Flag indicating whether to return the estimated moment. Defaults to False.

Returns:

  • tuple or float or array-like

  • Depending on the input parameters, the function returns

    • If get_params is True, returns a tuple containing the estimated parameters of the gamma distribution.

    • If get_moment is True, returns a tuple containing the estimated moment.

    • If comb_pvals is an array-like object, returns an array-like object of combined p-value.

    • Else, returns a float value of the combined p-value.

Raises:
  • ValueError

    • If list_pvals and comb_pvals are both None or both provided when get_params if False. * If moment is not 2, 3, or 4. * If tau is provided and is not between 0 (excluded) and 1.

  • Warning

    • If the moment of the distribution is negative, a lower moment will be used instead. * If all root of alpha in MME are complex, a lower moment will be used instead. * If all real root of alpha in MME are negative, a lower moment will be used instead. * If the smallest reachable p-values are below the float value closest to zero, a warning will be raised.

Notes

This function estimates the parameters of a gamma distribution using the method of moments. The method of moments matches the moments of the gamma distribution to the sample moments. The estimation can be performed for the lower tail or the upper tail of the distribution. The function supports estimation of moments up to the fourth moment. The estimated parameters can be obtained by setting get_params to True. The estimated moment can be obtained by setting get_moment to True.

Examples

>>> params = [(100, 50, 10), (200, 100, 20)]
>>> moment_matching_estimator(params, "under", list_pvals=[0.15892000985554622, 0.0485035183722576])
0.01974031791641234
>>> moment_matching_estimator(params, "over", comb_pvals=9.730946451863904)
0.01974031791641234
>>> moment_matching_estimator(params, "over", list_pvals=[0.15892000985554622, 0.0485035183722576], get_moment=True)
(3.087381868306516, 5.737504083931502)
>>> moment_matching_estimator(params, "under", list_pvals=[0.15892000985554622, 0.0485035183722576], get_params=True)
(1.6613368219541702, 1.8583720215596864, 1, 0)
>>> gaste_test.moment_matching_estimator(params, "under", comb_pvals=9.730946451863904, tau=0.2)
0.0018928012017953636
>>> gaste_test.moment_matching_estimator(params, "under", get_params=True, tau=0.2)
(0.9601452647906275, 1.9215151230936054, 0.2590451096515598, 0.45983274717174016)