anderson_ksamp(samples, midrank=True)
The k-sample Anderson-Darling test is a modification of the one-sample Anderson-Darling test. It tests the null hypothesis that k-samples are drawn from the same population without having to specify the distribution function of that population. The critical values depend on the number of samples.
defines three versions of the k-sample Anderson-Darling test: one for continuous distributions and two for discrete distributions, in which ties between samples may occur. The default of this routine is to compute the version based on the midrank empirical distribution function. This test is applicable to continuous and discrete data. If midrank is set to False, the right side empirical distribution is used for a test for discrete data. According to , the two discrete test statistics differ only slightly if a few collisions due to round-off errors occur in the test not adjusted for ties between samples.
The critical values corresponding to the significance levels from 0.01 to 0.25 are taken from . p-values are floored / capped at 0.1% / 25%. Since the range of critical values might be extended in future releases, it is recommended not to test p == 0.25
, but rather p >= 0.25
(analogously for the lower bound).
Array of sample data in arrays.
Type of Anderson-Darling test which is computed. Default (True) is the midrank test applicable to continuous and discrete populations. If False, the right side empirical distribution is used.
If less than 2 samples are provided, a sample is empty, or no distinct observations are in the samples.
An object containing attributes:
statistic
statistic
critical_values
critical_values
pvalue
pvalue
The Anderson-Darling test for k-samples.
anderson
ks_2samp
import numpy as np
from scipy import stats
rng = np.random.default_rng()
res = stats.anderson_ksamp([rng.normal(size=50),
rng.normal(loc=0.5, size=30)])
res.statistic, res.pvalue
res.critical_values
res = stats.anderson_ksamp([rng.normal(size=50),
rng.normal(size=30), rng.normal(size=20)])
res.statistic, res.pvalue
res.critical_values
Hover to see nodes names; edges to Self not shown, Caped at 50 nodes.
Using a canvas is more power efficient and can get hundred of nodes ; but does not allow hyperlinks; , arrows or text (beyond on hover)
SVG is more flexible but power hungry; and does not scale well to 50 + nodes.
All aboves nodes referred to, (or are referred from) current nodes; Edges from Self to other have been omitted (or all nodes would be connected to the central node "self" which is not useful). Nodes are colored by the library they belong to, and scaled with the number of references pointing them