📐 Concept diagram

### 12.10 — ANOVA

Phase: Statistics Prerequisites: 12-08-common-tests, 12-02-sampling-sampling-distributions, 08-08-eigenvalues-eigenvectors

Learning Objectives

By the end of this subject, you will be able to:

State the null and alternative hypotheses for one-way ANOVA
Decompose total variation into between-group and within-group components
Compute the F-statistic and conduct an ANOVA test
Understand when and why post-hoc tests are needed
Describe the conceptual framework of two-way ANOVA

Core Content

⚠️ CRITICAL: Why ANOVA Instead of Multiple t-tests?

If you have 3 groups and run 3 pairwise t-tests at $\alpha = 0.05$ each, the probability of at least one Type I error is $1 - (0.95)^3 \approx 0.143$ — nearly triple the nominal rate! With 5 groups (10 tests), it's about $1 - (0.95)^{10} \approx 0.40$.

ANOVA provides a single omnibus test: "Do ANY of the group means differ?" — controlling the family-wise error rate.

One-Way ANOVA: The Model

$Y_{ij} = \mu + \alpha_j + \epsilon_{ij}$, where $\epsilon_{ij} \sim N(0, \sigma^2)$ independently.

$\mu$: grand mean (overall average)
$\alpha_j$: effect of group $j$ ($\sum \alpha_j = 0$)
$Y_{ij}$: observation $i$ in group $j$

Hypotheses: $H_0: \alpha_1 = \alpha_2 = \cdots = \alpha_k = 0$ (all group means equal) vs $H_a$: at least one $\alpha_j \neq 0$.

The F-Statistic

ANOVA partitions total variation into between-group and within-group:

$$\text{SST} = \text{SSB} + \text{SSW}$$

SST: Total sum of squares = $\sum\sum(Y_{ij} - \bar{Y}_{\cdot\cdot})^2$ (df = $N-1$)
SSB: Between-group sum of squares = $\sum n_j(\bar{Y}{\cdot j} - \bar{Y}{\cdot\cdot})^2$ (df = $k-1$)
SSW: Within-group sum of squares = $\sum\sum(Y_{ij} - \bar{Y}_{\cdot j})^2$ (df = $N-k$)

Mean squares: - $\text{MSB} = \text{SSB} / (k-1)$ - $\text{MSW} = \text{SSW} / (N-k)$

$$F = \frac{\text{MSB}}{\text{MSW}} \sim F_{k-1, N-k}$$

Intuition: If $H_0$ is true, both MSB and MSW estimate the same $\sigma^2$, so $F \approx 1$. If $H_0$ is false, MSB overestimates $\sigma^2$ (it includes group effects), so $F > 1$.

Decision: Reject $H_0$ if $F > F_{\alpha, k-1, N-k}$ or if p-value $< \alpha$.

ANOVA Table

Source	df	SS	MS	F
Between	$k-1$	SSB	MSB = SSB/($k-1$)	MSB/MSW
Within	$N-k$	SSW	MSW = SSW/($N-k$)
Total	$N-1$	SST

⚠️ CRITICAL: ANOVA Assumptions

Independence: observations within and between groups
Normality: residuals within each group are approximately normal
Homogeneity of variance: $\sigma_1^2 = \sigma_2^2 = \cdots = \sigma_k^2$ (equal variances across groups)

ANOVA is fairly robust to moderate violations of normality (for balanced designs), but sensitive to variance inequality when group sizes differ.

Post-Hoc Tests

If ANOVA rejects $H_0$, we know at least one pair differs — but WHICH one(s)? Post-hoc tests answer this while controlling the family-wise error rate.

Tukey's HSD (Honestly Significant Difference): Compares all pairwise differences. A difference is significant if:

$$|\bar{Y}j - \bar{Y}{\ell}| > q_{\alpha, k, N-k} \cdot \sqrt{\frac{\text{MSW}}{2}\left(\frac{1}{n_j} + \frac{1}{n_{\ell}}\right)}$$

Where $q$ is the studentised range distribution.

Bonferroni correction: Simplest method — divide $\alpha$ by the number of comparisons. Conservative but valid.

Two-Way ANOVA (Conceptual)

When there are TWO categorical factors (e.g., drug type AND dosage):

$Y_{ijk} = \mu + \alpha_i + \beta_j + (\alpha\beta){ij} + \epsilon{ijk}$

Main effects: $\alpha_i$ (factor A), $\beta_j$ (factor B)
Interaction effect: $(\alpha\beta)_{ij}$ — does the effect of one factor depend on the level of the other?

The F-test for interaction tests whether the effect of one factor is consistent across levels of the other. If interaction is significant, main effects must be interpreted with caution.

Key Terms

ANOVA
F-statistic
Post-hoc tests
Two-way ANOVA

Worked Examples

Example 1: One-way ANOVA computation

Three teaching methods tested on students:

Method A	Method B	Method C
78	72	68
82	75	70
80	74	69
79	73	67

$n_A = n_B = n_C = 4$, $N = 12$, $k = 3$

Group means: $\bar{Y}_A = 79.75$, $\bar{Y}_B = 73.5$, $\bar{Y}_C = 68.5$

Grand mean: $\bar{Y} = (79.75 + 73.5 + 68.5)/3 = 221.75/3 = 73.917$

SSB: $4[(79.75-73.917)^2 + (73.5-73.917)^2 + (68.5-73.917)^2]$ $= 4[34.03 + 0.174 + 29.34] = 4 \cdot 63.544 = 254.18$

SSW: Method A: $(78-79.75)^2 + (82-79.75)^2 + (80-79.75)^2 + (79-79.75)^2 = 3.0625+5.0625+0.0625+0.5625=8.75$

Method B: $(72-73.5)^2 + (75-73.5)^2 + (74-73.5)^2 + (73-73.5)^2 = 2.25+2.25+0.25+0.25=5.0$

Method C: $(68-68.5)^2 + (70-68.5)^2 + (69-68.5)^2 + (67-68.5)^2 = 0.25+2.25+0.25+2.25=5.0$

SSW = 8.75 + 5.0 + 5.0 = 18.75

Source	df	SS	MS	F
Between	2	254.18	127.09	$127.09/2.083 = 61.01$
Within	9	18.75	2.083
Total	11	272.93

$F_{0.05, 2, 9} \approx 4.26$. Since $61.01 > 4.26$, we reject $H_0$ — the teaching methods differ significantly.

Example 2: Post-hoc (Bonferroni)

With $k=3$ groups, there are $\binom{3}{2} = 3$ comparisons. Bonferroni-adjusted $\alpha = 0.05/3 = 0.0167$.

Compare A vs B: $|\bar{Y}_A - \bar{Y}_B| = 6.25$ $\text{SE} = \sqrt{2.083(1/4+1/4)} = \sqrt{1.0415} = 1.021$

$t = 6.25/1.021 = 6.12$, $df = 9$, using Bonferroni critical value $t_{0.0083, 9} \approx 2.93$ → significant.

Similarly, all pairs are significant — each method differs from every other.

Example 3: Interaction interpretation

Two-way ANOVA on crop yield: Factor A = fertiliser (yes/no), Factor B = water (low/high).

	Low Water	High Water
No Fertiliser	10	14
Fertiliser	12	22

Main effect of fertiliser (averaged over water): $(12+22)/2 - (10+14)/2 = 17 - 12 = 5$

Main effect of water (averaged over fertiliser): $(14+22)/2 - (10+12)/2 = 18 - 11 = 7$

Interaction: Does fertiliser effect depend on water? - Low water: 12 - 10 = +2 - High water: 22 - 14 = +8

The effect of fertiliser is much larger under high water — this is an interaction. The main effects alone don't tell the full story.

Quiz

Q1: What does the concept of ANOVA primarily refer to in this subject?

A) A computational error related to ANOVA B) A historical anecdote about ANOVA C) The definition and application of ANOVA D) A visual representation of ANOVA

Correct: C)

If you chose A: This is incorrect. ANOVA is defined as: the definition and application of anova. The other options describe different aspects that are not the primary focus.
If you chose B: This is incorrect. ANOVA is defined as: the definition and application of anova. The other options describe different aspects that are not the primary focus.
If you chose C: ANOVA is defined as: the definition and application of anova. The other options describe different aspects that are not the primary focus. Correct!
If you chose D: This is incorrect. ANOVA is defined as: the definition and application of anova. The other options describe different aspects that are not the primary focus.

Q2: Which of the following is the key formula discussed in this subject?

A) \alpha = 0.05 B) A simplified version of \alpha = 0.05... C) An unrelated formula from a different topic D) The inverse operation of the formula in question

Correct: A)

If you chose A: The formula \alpha = 0.05 is central to this subject. The other options are either simplified versions or unrelated. Correct!
If you chose B: This is incorrect. The formula \alpha = 0.05 is central to this subject. The other options are either simplified versions or unrelated.
If you chose C: This is incorrect. The formula \alpha = 0.05 is central to this subject. The other options are either simplified versions or unrelated.
If you chose D: This is incorrect. The formula \alpha = 0.05 is central to this subject. The other options are either simplified versions or unrelated.

Q3: What is the primary purpose of F-statistic?

A) It is primarily a historical notation system B) It is used to f-statistic in mathematical analysis C) It is used only in advanced research contexts D) It replaces all other methods in this domain

Correct: B)

If you chose A: This is incorrect. F-statistic serves the purpose described in the correct answer. The other options misrepresent its role.
If you chose B: F-statistic serves the purpose described in the correct answer. The other options misrepresent its role. Correct!
If you chose C: This is incorrect. F-statistic serves the purpose described in the correct answer. The other options misrepresent its role.
If you chose D: This is incorrect. F-statistic serves the purpose described in the correct answer. The other options misrepresent its role.

Q4: Which statement about Post-hoc tests is TRUE?

A) Post-hoc tests is a fundamental concept covered in this subject B) Post-hoc tests is an advanced topic beyond this subject's scope C) Post-hoc tests is not related to this subject D) Post-hoc tests is mentioned only as a historical footnote

Correct: A)

If you chose A: Post-hoc tests is a fundamental concept covered in this subject. This subject covers Post-hoc tests as part of its core content. Correct!
If you chose B: This is incorrect. Post-hoc tests is a fundamental concept covered in this subject. This subject covers Post-hoc tests as part of its core content.
If you chose C: This is incorrect. Post-hoc tests is a fundamental concept covered in this subject. This subject covers Post-hoc tests as part of its core content.
If you chose D: This is incorrect. Post-hoc tests is a fundamental concept covered in this subject. This subject covers Post-hoc tests as part of its core content.

Q5: Based on the worked examples in this subject, what is the correct result?

A) An unrelated numerical value B) A different result from a common mistake C) The inverse of the correct answer D) ** Compares all pairwise differences. A difference

Correct: D)

If you chose A: This is incorrect. The worked examples show that the result is ** Compares all pairwise differences. A difference. The other options represent common errors.
If you chose B: This is incorrect. The worked examples show that the result is ** Compares all pairwise differences. A difference. The other options represent common errors.
If you chose C: This is incorrect. The worked examples show that the result is ** Compares all pairwise differences. A difference. The other options represent common errors.
If you chose D: The worked examples show that the result is ** Compares all pairwise differences. A difference. The other options represent common errors. Correct!

Q6: How are Post-hoc tests and Two-way ANOVA related?

A) Post-hoc tests is the inverse of Two-way ANOVA B) Post-hoc tests and Two-way ANOVA are completely unrelated topics C) Post-hoc tests is a special case of Two-way ANOVA D) Post-hoc tests and Two-way ANOVA are closely related concepts

Correct: D)

If you chose A: This is incorrect. Both Post-hoc tests and Two-way ANOVA are covered in this subject as interconnected topics.
If you chose B: This is incorrect. Both Post-hoc tests and Two-way ANOVA are covered in this subject as interconnected topics.
If you chose C: This is incorrect. Both Post-hoc tests and Two-way ANOVA are covered in this subject as interconnected topics.
If you chose D: Both Post-hoc tests and Two-way ANOVA are covered in this subject as interconnected topics. Correct!

Q7: What is a common pitfall when working with ⚠️ Critical: Why Anova Instead Of Multiple T-Tests??

A) A common mistake is confusing ⚠️ Critical: Why Anova Instead Of Multiple T-Tests? with a similar concept B) ⚠️ Critical: Why Anova Instead Of Multiple T-Tests? is always computed the same way in all contexts C) ⚠️ Critical: Why Anova Instead Of Multiple T-Tests? has no common misconceptions D) The main error with ⚠️ Critical: Why Anova Instead Of Multiple T-Tests? is using it when it is not needed

Correct: A)

If you chose A: Students often confuse ⚠️ Critical: Why Anova Instead Of Multiple T-Tests? with similar-sounding or related concepts. Pay attention to the precise definitions. Correct!
If you chose B: This is incorrect. Students often confuse ⚠️ Critical: Why Anova Instead Of Multiple T-Tests? with similar-sounding or related concepts. Pay attention to the precise definitions.
If you chose C: This is incorrect. Students often confuse ⚠️ Critical: Why Anova Instead Of Multiple T-Tests? with similar-sounding or related concepts. Pay attention to the precise definitions.
If you chose D: This is incorrect. Students often confuse ⚠️ Critical: Why Anova Instead Of Multiple T-Tests? with similar-sounding or related concepts. Pay attention to the precise definitions.

Q8: When should you apply One-Way Anova: The Model?

A) Use One-Way Anova: The Model only in pure mathematics contexts B) Apply One-Way Anova: The Model to solve problems in this subject's domain C) Avoid One-Way Anova: The Model unless explicitly instructed D) One-Way Anova: The Model is not practically useful

Correct: B)

If you chose A: This is incorrect. One-Way Anova: The Model is a practical tool used throughout this subject to solve relevant problems.
If you chose B: One-Way Anova: The Model is a practical tool used throughout this subject to solve relevant problems. Correct!
If you chose C: This is incorrect. One-Way Anova: The Model is a practical tool used throughout this subject to solve relevant problems.
If you chose D: This is incorrect. One-Way Anova: The Model is a practical tool used throughout this subject to solve relevant problems.

Practice Problems

For $k=4$ groups with $n=10$ each, what are the ANOVA degrees of freedom?

Click for answer
Between groups: $k-1 = 3$ Within groups: $N-k = 40-4 = 36$ Total: $N-1 = 39$ F-statistic follows $F_{3, 36}$.
ANOVA gives $F = 0.87$ with $df = (3, 28)$, $p = 0.47$. Interpret.

Click for answer
$F \approx 1$ suggests MSB and MSW are similar — there is no evidence that group means differ. $p = 0.47$ is far above any conventional $\alpha$. Fail to reject $H_0$. Do NOT run post-hoc tests — ANOVA already told you no differences were found.
Why do we need post-hoc tests after a significant ANOVA?

Click for answer
ANOVA only tells you that AT LEAST ONE pair of means differs — it doesn't tell you WHICH one(s). Post-hoc tests identify the specific pairs that differ while controlling the family-wise error rate. Running multiple t-tests instead would inflate the Type I error rate.
You run ANOVA and reject $H_0$. You then run Tukey HSD and find NO significant pairwise differences. Is this possible? How?

Click for answer
Yes, this can happen. Tukey HSD controls the family-wise error rate more strictly than ANOVA's omnibus F-test. The F-test can detect that means are not all equal (e.g., a complex contrast) without any single pairwise comparison reaching significance. It's uncommon but possible, especially with many groups.
In a two-way ANOVA, the interaction term is significant ($p = 0.003$). How should you interpret the main effects?

Click for answer
When interaction is significant, main effects should be interpreted with caution — the effect of one factor depends on the level of the other. Rather than reporting "Factor A increases Y by X units," report the simple effects (effect of A at each level of B separately). Plotting the interaction helps visualise the dependence.

Summary

Key takeaways:

ANOVA tests whether any group means differ, controlling family-wise error rate
F-statistic = MSB/MSW; $F \approx 1$ under $H_0$, $F > 1$ when groups differ
SSB + SSW = SST: total variation = between-group + within-group
Assumptions: independence, normality of residuals, equal variances
Post-hoc tests (Tukey HSD, Bonferroni) identify WHICH pairs differ
Two-way ANOVA adds interaction effects — if significant, interpret simple effects

Pitfalls

Running post-hoc tests when ANOVA is not significant: Post-hoc pairwise comparisons are only warranted after a significant omnibus F-test. If ANOVA fails to reject H₀, the data do not support the claim that any group means differ. Fishing for significant pairwise differences anyway inflates the false positive rate — the omnibus test is the gatekeeper.
Assuming ANOVA is robust to all assumption violations: While ANOVA is fairly robust to moderate non-normality (especially with balanced designs), it is sensitive to unequal variances when group sizes differ substantially. The classic example: with n₁ = 5, n₂ = 30 and σ₁²/σ₂² = 4, the actual Type I error rate can be well above 0.05. Check homogeneity of variance, not just normality.
Interpreting main effects when interaction is significant: In two-way ANOVA, a significant interaction means the effect of one factor depends on the level of the other. Reporting "Factor A increases Y by X units" is misleading when the effect differs across levels of Factor B. Plot the interaction and report simple effects (effect of A at each level of B) instead.
Using multiple t-tests instead of ANOVA for k ≥ 3 groups: With k = 5 groups, there are 10 pairwise t-tests. Even if all null hypotheses are true, the probability of at least one false positive is 1 − 0.95¹⁰ ≈ 0.40. ANOVA's F-test controls the family-wise error rate at α, and post-hoc tests maintain this control while identifying specific differences.
Believing a significant F-test identifies which groups differ: ANOVA answers only the question "are any group means different?" It does not tell you which ones. The F-statistic can be significant due to a complex contrast (e.g., groups A and B differ from C, but not from each other) even when no single pairwise comparison is significant. Post-hoc tests are essential for pinpointing differences.

Next Steps

Next up: 13-01-entropy.md — Information Theory begins!

Progress

Phases

### 12.10 — ANOVA

Learning Objectives

Core Content

⚠️ CRITICAL: Why ANOVA Instead of Multiple t-tests?

One-Way ANOVA: The Model

The F-Statistic

ANOVA Table

⚠️ CRITICAL: ANOVA Assumptions

Post-Hoc Tests

Two-Way ANOVA (Conceptual)

Key Terms

Worked Examples

Example 1: One-way ANOVA computation

Example 2: Post-hoc (Bonferroni)

Example 3: Interaction interpretation

Quiz

Practice Problems

Summary

Pitfalls

Next Steps