Consider a box containing M balls:, :math:`n` red and M-n blue. We randomly sample balls from the box, one at a time and without replacement, until we have picked r blue balls. nhypergeom is the distribution of the number of red balls k we have picked.
%(before_notes)s
The symbols used to denote the shape parameters (M, n, and r) are not universally accepted. See the Examples for a clarification of the definitions used here.
The probability mass function is defined as,
for k \in [0, n], n \in [0, M], r \in [0, M-n], and the binomial coefficient is:
It is equivalent to observing k successes in k+r-1 samples with k+r'th sample being a failure. The former can be modelled as a hypergeometric distribution. The probability of the latter is simply the number of failures remaining M-n-(r-1) divided by the size of the remaining population M-(k+r-1). This relationship can be shown as:
where NHG is probability mass function (PMF) of the negative hypergeometric distribution and HG is the PMF of the hypergeometric distribution.
%(after_notes)s
A negative hypergeometric discrete random variable.
import numpy as np
from scipy.stats import nhypergeom
import matplotlib.pyplot as plt
M, n, r = [20, 7, 12]
rv = nhypergeom(M, n, r)
x = np.arange(0, n+2)
pmf_dogs = rv.pmf(x)
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(x, pmf_dogs, 'bo')
ax.vlines(x, 0, pmf_dogs, lw=2)
ax.set_xlabel('# of dogs in our group with given 12 failures')
ax.set_ylabel('nhypergeom PMF')
plt.show()
prb = nhypergeom.pmf(x, M, n, r)
R = nhypergeom.rvs(M, n, r, size=10)
from scipy.stats import hypergeom, nhypergeom
M, n, r = 45, 13, 8
k = 6
nhypergeom.pmf(k, M, n, r)
hypergeom.pmf(k, M, n, k+r-1) * (M - n - (r-1)) / (M - (k+r-1))
Hover to see nodes names; edges to Self not shown, Caped at 50 nodes.
Using a canvas is more power efficient and can get hundred of nodes ; but does not allow hyperlinks; , arrows or text (beyond on hover)
SVG is more flexible but power hungry; and does not scale well to 50 + nodes.
All aboves nodes referred to, (or are referred from) current nodes; Edges from Self to other have been omitted (or all nodes would be connected to the central node "self" which is not useful). Nodes are colored by the library they belong to, and scaled with the number of references pointing them