monte_carlo_test(sample, rvs, statistic, *, vectorized=None, n_resamples=9999, batch=None, alternative='two-sided', axis=0)
The null hypothesis is that the provided sample was drawn at random from the distribution for which rvs generates random variates. The value of the statistic for the given sample is compared against a Monte Carlo null distribution: the value of the statistic for each of n_resamples samples generated by rvs. This gives the p-value, the probability of observing such an extreme value of the test statistic under the null hypothesis.
An array of observations.
Statistic for which the p-value of the hypothesis test is to be calculated. statistic must be a callable that accepts a sample (e.g. statistic(sample)
) and returns the resulting statistic. If vectorized is set True
, statistic must also accept a keyword argument axis and be vectorized to compute the statistic along the provided axis of the sample array.
If vectorized is set False
, statistic will not be passed keyword argument axis and is expected to calculate the statistic only for 1D samples. If True
, statistic will be passed keyword argument axis and is expected to calculate the statistic along axis when passed an ND sample array. If None
(default), vectorized will be set True
if axis
is a parameter of statistic. Use of a vectorized statistic typically reduces computation time.
Number of random permutations used to approximate the Monte Carlo null distribution.
The number of permutations to process in each call to statistic. Memory usage is O(batch'*''sample.size[axis]'). Default is None
, in which case batch equals n_resamples.
The alternative hypothesis for which the p-value is calculated. For each alternative, the p-value is defined as follows.
The axis of sample over which to calculate the statistic.
The observed test statistic of the sample.
The p-value for the given alternative.
The values of the test statistic generated under the null hypothesis.
Monte Carlo test that a sample is drawn from a given distribution.
import numpy as np
from scipy import stats
def statistic(x, axis):
return stats.skew(x, axis)
rng = np.random.default_rng()
x = stats.skewnorm.rvs(a=1, size=50, random_state=rng)
statistic(x, axis=0)
from scipy.stats import monte_carlo_test
# because our statistic is vectorized, we pass `vectorized=True`
rvs = lambda size: stats.norm.rvs(size=size, random_state=rng)
res = monte_carlo_test(x, rvs, statistic, vectorized=True)
print(res.statistic)
print(res.pvalue)
stats.skewtest(x).pvalue
x = stats.skewnorm.rvs(a=1, size=7, random_state=rng)
# stats.skewtest(x) would produce an error due to small sample
res = monte_carlo_test(x, rvs, statistic, vectorized=True)
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.hist(res.null_distribution, bins=50)
ax.set_title("Monte Carlo distribution of test statistic")
ax.set_xlabel("Value of Statistic")
ax.set_ylabel("Frequency")
plt.show()
Hover to see nodes names; edges to Self not shown, Caped at 50 nodes.
Using a canvas is more power efficient and can get hundred of nodes ; but does not allow hyperlinks; , arrows or text (beyond on hover)
SVG is more flexible but power hungry; and does not scale well to 50 + nodes.
All aboves nodes referred to, (or are referred from) current nodes; Edges from Self to other have been omitted (or all nodes would be connected to the central node "self" which is not useful). Nodes are colored by the library they belong to, and scaled with the number of references pointing them