Chi-Square Test of Independence in R: chisq.test() Guide

A chi-square test of independence in R tests whether two categorical variables are related. Pass a contingency table to chisq.test() and read the p-value to decide whether the observed pattern of counts is consistent with independence.

By Selva Prabhakaran · Published May 13, 2026 · Last updated May 13, 2026

⚡ Quick Answer

chisq.test(table(df$x, df$y))                    # from a data frame
chisq.test(matrix(c(30,10,20,40), nrow=2))       # from a matrix
chisq.test(tbl)$p.value                          # extract p-value
chisq.test(tbl)$expected                         # expected counts under H0
chisq.test(tbl, simulate.p.value = TRUE)         # Monte Carlo p-value (small N)
fisher.test(tbl)                                 # alternative for small N
chisq.test(tbl)$residuals                        # standardized residuals

Need explanation? Read on for examples and pitfalls.

📊 Is chi-square independence the right test?

What a chi-square test of independence does in one sentence

It compares observed cell counts in a contingency table to the counts EXPECTED if the two variables were independent. The test statistic measures how far observed counts deviate from expected; the p-value tells you the probability of seeing that much deviation by chance under independence.

The null hypothesis: the two variables are independent (no association). The alternative: they are dependent (some association exists). Used heavily in survey analysis, A/B testing categorical outcomes, and any case where you have a 2-way table of counts.

Syntax

chisq.test() accepts a 2-way contingency table as a matrix or a table() object.

Run live

Run live, no install needed. Every R block on this page runs in your browser. Click Run, edit the code, re-run instantly. No setup.

RBuild a contingency table from raw data

# Use mtcars: cyl x am (transmission) tbl <- table(mtcars$cyl, mtcars$am) tbl #> 0 1 #> 4 3 8 #> 6 4 3 #> 8 12 2 chisq.test(tbl) #> #> Pearson's Chi-squared test #> #> data: tbl #> X-squared = 8.7407, df = 2, p-value = 0.01265

The result is a htest object containing the chi-square statistic, degrees of freedom, p-value, and (via $expected) the expected counts.

Tip

The chi-square test is reliable when ALL expected counts are at least 5. If any cell has expected count below 5, use fisher.test() (Fisher's exact test) or chisq.test(tbl, simulate.p.value = TRUE) for a Monte Carlo p-value. R warns you ("Chi-squared approximation may be incorrect") when the assumption is violated.

Five common patterns

1. Basic test from a contingency table

RTest cyl vs am independence

tbl <- table(mtcars$cyl, mtcars$am) result <- chisq.test(tbl) result$statistic # X-squared result$parameter # df result$p.value # p-value #> X-squared #> 8.74074 #> #> df #> 2 #> [1] 0.01264981

A small p-value (here 0.013) suggests cyl and transmission type are NOT independent.

2. Inspect expected counts

RCompare observed vs expected

result$expected #> 0 1 #> 4 6.5938 4.4063 #> 6 4.0625 2.9375 #> 8 8.3438 5.6563

Under independence, the expected count in each cell is (row_total * col_total) / grand_total. Comparing to the observed table tells you WHICH cells deviate most from independence.

3. Standardized residuals

RWhere is the deviation strongest?

round(result$residuals, 2) #> 0 1 #> 4 -1.40 1.71 #> 6 -0.03 0.04 #> 8 1.27 -1.54

Residuals show how far each cell is from expected, in standard-deviation units. Values larger than +/-2 indicate cells contributing strongly to the test statistic. Here, 4-cyl manual cars (1.71) have notably more than expected.

4. Goodness-of-fit (one variable vs proportions)

RTest if cyl frequencies match expected proportions

observed <- table(mtcars$cyl) # observed counts expected_props <- c(0.4, 0.3, 0.3) # hypothesized proportions for 4, 6, 8 cyl chisq.test(observed, p = expected_props) #> #> Chi-squared test for given probabilities #> #> data: observed #> X-squared = 1.5, df = 2, p-value = 0.4724

When a vector and p are given, chisq.test() performs a goodness-of-fit test instead of independence.

5. Fisher's exact for small samples

RWhen expected counts are small

small_tbl <- matrix(c(2, 8, 7, 3), nrow = 2) fisher.test(small_tbl) #> #> Fisher's Exact Test for Count Data #> #> data: small_tbl #> p-value = 0.06978 #> alternative hypothesis: true odds ratio is not equal to 1 #> 95 percent confidence interval: #> 0.005012 1.244790

Fisher's exact computes the EXACT p-value rather than relying on the chi-square approximation. Use it when any expected count < 5 in a 2x2 table.

Key Insight

A significant chi-square test tells you something is associated; it does NOT tell you what or how strongly. Always compute an EFFECT SIZE (Cramer's V or phi for 2x2) AND inspect the residuals to understand the pattern. P-value alone, especially with large samples, can be misleading: with N = 100,000 even tiny irrelevant associations are "significant".

Chi-square test assumptions

Chi-square works under specific conditions; violating them invalidates the test. Below are the standard assumptions, how to verify them, and what to do when they fail.

Assumption	How to check	Fix if violated
Independent observations	Study design	Use mcnemar.test for paired
Random sampling	Study design	Acknowledge in results
Expected counts >= 5 in all cells	Check `result$expected`	fisher.test() or simulate.p.value
Mutually exclusive categories	Confirm in data	Recode to ensure

For larger tables (3x4 or more), the rule is "at most 20% of cells with expected count < 5, and no cell with expected count < 1".

A practical workflow for chi-square in real data

Real-world chi-square use is more than just running the test. The full workflow includes data prep, assumption checking, the test itself, and post-hoc interpretation.

When applying chi-square in practice, follow this sequence. First, decide whether your data are paired (same units measured twice) or independent (two distinct samples). Paired data needs mcnemar.test(), not chi-square. Second, build the contingency table with table() and inspect it visually: do any cells look very small? Third, run chisq.test() and read the warning carefully if it appears. Fourth, examine $expected to verify the assumption of expected counts at least 5. If violated, switch to fisher.test() or use simulate.p.value = TRUE.

After the test, do not stop at the p-value. Look at $residuals to see WHICH cells deviate. Compute Cramer's V (effect size) via vcd::assocstats() to gauge association strength. Plot the table as a mosaic plot for visual interpretation. A statistically significant test that nobody can interpret is rarely useful; the residuals and effect size make the result actionable.

For reporting, always include: the chi-square statistic, degrees of freedom, p-value, effect size (Cramer's V), and the contingency table itself. Readers need all of these to evaluate your claim.

Common pitfalls

Pitfall 1: passing a 1-D vector when independence test is intended. chisq.test(c(10, 20, 30)) runs a goodness-of-fit test against equal probabilities, NOT an independence test. For independence, pass a 2-way table.

Pitfall 2: ignoring the "approximation may be incorrect" warning. It means you have small expected counts. Switch to Fisher's exact or use Monte Carlo p-values: chisq.test(tbl, simulate.p.value = TRUE, B = 10000).

Warning

Chi-square is sensitive to LARGE samples. With N in the tens of thousands, a chi-square test will detect tiny, practically irrelevant associations. Always pair the p-value with an effect size: Cramer's V for general k x m tables, phi for 2x2. Cramer's V close to 0 means weak association even when p is tiny.

Pitfall 3: using chi-square for paired categorical data. If subjects are measured twice (pre/post categorical), use mcnemar.test(), not chi-square. Chi-square assumes independent observations.

Try it yourself

Try it: Test whether cyl and gear are independent in mtcars. Extract the p-value and check expected counts. Save to ex_test.

RYour turn: chi-square on cyl vs gear

# Try it: build table, run test, inspect ex_test <- # your code here ex_test$p.value ex_test$expected #> Expected: small p-value, expected matrix showing potential issues

Click to reveal solution

RSolution

ex_test <- chisq.test(table(mtcars$cyl, mtcars$gear)) ex_test$p.value #> [1] 0.001214066 ex_test$expected #> 3 4 5 #> 4 5.156 4.125 1.719 #> 6 3.281 2.625 1.094 #> 8 6.562 5.250 2.187 #> Warning: Chi-squared approximation may be incorrect

Explanation: Some expected counts are below 5 (e.g., 1.094, 1.719), so R warns. The p-value 0.0012 suggests dependence between cyl and gear, but the warning means the chi-square approximation may be inaccurate. Switch to fisher.test() for a rigorous answer or simulate.p.value = TRUE.

After mastering chi-square independence, look at:

fisher.test(): exact test for small expected counts (especially 2x2)
mcnemar.test(): paired-categorical alternative
prop.test(): compare two or more proportions
binom.test(): exact test for a single proportion
kruskal.test(): when one variable is ordinal
cor.test(method = "spearman"): when both are ordinal

For effect size: vcd::assocstats(tbl) returns Cramer's V and phi alongside the chi-square test.

FAQ

How do I do a chi-square test of independence in R?

Build a contingency table with table(df$x, df$y), then pass to chisq.test(). The result includes the chi-square statistic, degrees of freedom, and p-value. A small p-value indicates the variables are not independent.

What is the difference between chi-square test and Fisher's exact test?

Chi-square uses an approximation that requires expected counts of at least 5 in each cell. Fisher's exact computes the precise probability and works with any cell counts. Use chi-square for large counts; Fisher's exact for small counts (especially 2x2 tables).

How do I check chi-square test assumptions in R?

Run chisq.test(tbl)$expected to see expected counts. If any are below 5, the chi-square approximation may be inaccurate. R prints a warning ("Chi-squared approximation may be incorrect") in that case.

How do I extract the p-value from chi-square in R?

Save the result and use $p.value: result <- chisq.test(tbl); result$p.value. Other fields: $statistic (X-squared), $parameter (df), $expected, $residuals.

What does a significant chi-square test mean?

It means the data are inconsistent with the null hypothesis of independence. Some association exists. The test does NOT tell you the strength or direction of the association; check $residuals to identify which cells deviate most, and compute Cramer's V for an effect-size measure.

Navigate

Tidyverse packages

Deep dives

Wrangling & EDA

Statistics

Machine Learning

Time Series

By Industry

Reporting & Apps

Levels

Chi-Square Test of Independence in R: chisq.test() Guide

What a chi-square test of independence does in one sentence

Syntax

Five common patterns

1. Basic test from a contingency table

2. Inspect expected counts

3. Standardized residuals

4. Goodness-of-fit (one variable vs proportions)

5. Fisher's exact for small samples

Chi-square test assumptions

A practical workflow for chi-square in real data

Common pitfalls

Try it yourself

FAQ

Navigate

Tidyverse packages

Deep dives

Wrangling & EDA

Statistics

Machine Learning

Time Series

By Industry

Reporting & Apps

Levels

Chi-Square Test of Independence in R: chisq.test() Guide

What a chi-square test of independence does in one sentence

Syntax

Five common patterns

1. Basic test from a contingency table

2. Inspect expected counts

3. Standardized residuals

4. Goodness-of-fit (one variable vs proportions)

5. Fisher's exact for small samples

Chi-square test assumptions

A practical workflow for chi-square in real data

Common pitfalls

Try it yourself

Related tests

FAQ