χ² ≈ 15.04, p < 0.001 — variables are not independent
With cells (40, 60, 70, 30), the chi-squared statistic is approximately 15.04 with df=1, yielding p < 0.001. The variables show a significant association with Cramér's V ≈ 0.27.
Testing Whether Two Things Are Related
The chi-squared test of independence, introduced by Karl Pearson in 1900, is one of the oldest and most widely used statistical tests. It answers a simple but powerful question: are two categorical variables independent, or is there an association between them? Doctors use it to test whether a treatment is associated with recovery. Marketers use it to test whether demographics predict purchase behavior. Social scientists use it everywhere.
The Contingency Table
The test begins with a contingency table — a grid showing observed counts for each combination of categories. For a 2×2 table, you have four cells. The test compares these observed counts to the expected counts under the assumption of independence. Expected counts are calculated from the row and column totals: if variables are truly independent, the proportion in each cell should be the product of its row and column proportions.
Computing the Statistic
The chi-squared statistic sums the squared differences between observed and expected counts, each divided by the expected count. This normalization ensures that a deviation of 10 in a cell expecting 100 counts less than a deviation of 10 in a cell expecting 20. The resulting statistic follows the chi-squared distribution (approximately), from which we obtain the p-value.
Beyond Significance: Effect Size
A significant chi-squared test tells you the variables are not independent, but it doesn't tell you how strongly they're associated. Cramér's V fills this gap — it normalizes the chi-squared statistic to a 0-to-1 scale independent of sample size. This simulator computes both, because in practice, a tiny association in a huge dataset can be statistically significant yet practically meaningless.
FAQ
When should I use a chi-squared test?
Use the chi-squared test of independence when you have two categorical variables and want to determine whether they are associated. Common examples: testing whether gender is associated with voting preference, or whether treatment group is associated with recovery outcome. Both variables must be categorical, and expected cell counts should generally be at least 5.
What does the chi-squared statistic measure?
The chi-squared statistic measures the total discrepancy between observed and expected cell counts in a contingency table. Expected counts are calculated assuming the variables are independent. A larger χ² indicates a greater departure from independence. The statistic is always non-negative and is zero only when observed counts exactly match expected counts.
What is Cramér's V and how do I interpret it?
Cramér's V is an effect size measure that ranges from 0 (no association) to 1 (perfect association). Unlike χ², it doesn't depend on sample size, making it useful for comparing association strength across studies. Conventional benchmarks: V < 0.1 is negligible, 0.1-0.3 is small-to-medium, > 0.3 is medium-to-large.
What are the assumptions of the chi-squared test?
The main assumptions are: observations are independent, data are counts (not proportions), and expected cell counts are sufficiently large (typically ≥ 5). When expected counts are too small, use Fisher's exact test instead. The test is also sensitive to sample size — very large samples may yield significant results for trivially small associations.