The formula for the hypergeometric distribution is used to calculate the probability of obtaining a specific number of "successes" in a sample drawn without replacement from a finite population.
The exact formula for the hypergeometric probability distribution is:
$$f(x) = \frac{(k \text{ choose } x)((n-k) \text{ choose } (n-x))}{(N \text{ choose } n)}$$
More formally, using combination notation:
$$f(x) = \frac{\binom{k}{x} \binom{n-k}{n-x}}{\binom{N}{n}}$$
Where:
- $\binom{A}{B}$ (read as "A choose B") represents the number of ways to choose B items from a set of A items, calculated as $\frac{A!}{B!(A-B)!}$.
Understanding the Hypergeometric Distribution Formula
The hypergeometric distribution is crucial for scenarios where sampling is done without replacement, meaning that once an item is selected from the population, it is not returned. This impacts the probability of subsequent selections. The formula calculates the probability of drawing exactly x successes in a sample of size n, given a population of size N with k total successes.
Key Components of the Formula
Each variable in the hypergeometric distribution formula represents a specific aspect of the population and the sample:
Variable | Description |
---|---|
N | The total size of the population being sampled |
n | The size of the sample being drawn |
k | The total number of "successes" in the population |
x | The number of "successes" found in the sample |
How the Formula Works
The formula breaks down into three main parts, reflecting the combinations of drawing successes and non-successes:
- Numerator (Part 1): $\binom{k}{x}$ represents the number of ways to choose x successes from the k available "successes" in the population.
- Numerator (Part 2): $\binom{n-k}{n-x}$ represents the number of ways to choose the remaining n-x items (which must be "failures") from the n-k available items that are not successes within the sample. This calculates the ways to get the required number of "failures" in your sample.
- Denominator: $\binom{N}{n}$ represents the total number of distinct ways to choose n items from the entire population of N items, without regard to whether they are successes or failures.
By dividing the number of favorable outcomes (combinations of successes and failures) by the total possible outcomes, the formula yields the probability of obtaining exactly x successes in the sample.
Practical Application
Consider a practical scenario:
Imagine you have a bag with 10 marbles (N=10), 3 of which are red (k=3, successes). You decide to draw a sample of 4 marbles from the bag without putting them back (n=4). If you want to find the probability of drawing exactly 2 red marbles (x=2) in your sample, the hypergeometric distribution formula would be used. It quantifies the likelihood of such an event given the specific composition of the population and the sample size.