### 12.10 โ ANOVA
Phase: Statistics Prerequisites: 12-08-common-tests, 12-02-sampling-sampling-distributions, 08-08-eigenvalues-eigenvectors
Learning Objectives
By the end of this subject, you will be able to:
- State the null and alternative hypotheses for one-way ANOVA
- Decompose total variation into between-group and within-group components
- Compute the F-statistic and conduct an ANOVA test
- Understand when and why post-hoc tests are needed
- Describe the conceptual framework of two-way ANOVA
Core Content
โ ๏ธ CRITICAL: Why ANOVA Instead of Multiple t-tests?
If you have 3 groups and run 3 pairwise t-tests at $\alpha = 0.05$ each, the probability of at least one Type I error is $1 - (0.95)^3 \approx 0.143$ โ nearly triple the nominal rate! With 5 groups (10 tests), it's about $1 - (0.95)^{10} \approx 0.40$.
ANOVA provides a single omnibus test: "Do ANY of the group means differ?" โ controlling the family-wise error rate.
One-Way ANOVA: The Model
$Y_{ij} = \mu + \alpha_j + \epsilon_{ij}$, where $\epsilon_{ij} \sim N(0, \sigma^2)$ independently.
- $\mu$: grand mean (overall average)
- $\alpha_j$: effect of group $j$ ($\sum \alpha_j = 0$)
- $Y_{ij}$: observation $i$ in group $j$
Hypotheses: $H_0: \alpha_1 = \alpha_2 = \cdots = \alpha_k = 0$ (all group means equal) vs $H_a$: at least one $\alpha_j \neq 0$.
The F-Statistic
ANOVA partitions total variation into between-group and within-group:
$$\text{SST} = \text{SSB} + \text{SSW}$$
- SST: Total sum of squares = $\sum\sum(Y_{ij} - \bar{Y}_{\cdot\cdot})^2$ (df = $N-1$)
- SSB: Between-group sum of squares = $\sum n_j(\bar{Y}{\cdot j} - \bar{Y}{\cdot\cdot})^2$ (df = $k-1$)
- SSW: Within-group sum of squares = $\sum\sum(Y_{ij} - \bar{Y}_{\cdot j})^2$ (df = $N-k$)
Mean squares: - $\text{MSB} = \text{SSB} / (k-1)$ - $\text{MSW} = \text{SSW} / (N-k)$
$$F = \frac{\text{MSB}}{\text{MSW}} \sim F_{k-1, N-k}$$
Intuition: If $H_0$ is true, both MSB and MSW estimate the same $\sigma^2$, so $F \approx 1$. If $H_0$ is false, MSB overestimates $\sigma^2$ (it includes group effects), so $F > 1$.
Decision: Reject $H_0$ if $F > F_{\alpha, k-1, N-k}$ or if p-value $< \alpha$.
ANOVA Table
| Source | df | SS | MS | F |
|---|---|---|---|---|
| Between | $k-1$ | SSB | MSB = SSB/($k-1$) | MSB/MSW |
| Within | $N-k$ | SSW | MSW = SSW/($N-k$) | |
| Total | $N-1$ | SST |
โ ๏ธ CRITICAL: ANOVA Assumptions
- Independence: observations within and between groups
- Normality: residuals within each group are approximately normal
- Homogeneity of variance: $\sigma_1^2 = \sigma_2^2 = \cdots = \sigma_k^2$ (equal variances across groups)
ANOVA is fairly robust to moderate violations of normality (for balanced designs), but sensitive to variance inequality when group sizes differ.
Post-Hoc Tests
If ANOVA rejects $H_0$, we know at least one pair differs โ but WHICH one(s)? Post-hoc tests answer this while controlling the family-wise error rate.
Tukey's HSD (Honestly Significant Difference): Compares all pairwise differences. A difference is significant if:
$$|\bar{Y}j - \bar{Y}{\ell}| > q_{\alpha, k, N-k} \cdot \sqrt{\frac{\text{MSW}}{2}\left(\frac{1}{n_j} + \frac{1}{n_{\ell}}\right)}$$
Where $q$ is the studentised range distribution.
Bonferroni correction: Simplest method โ divide $\alpha$ by the number of comparisons. Conservative but valid.
Two-Way ANOVA (Conceptual)
When there are TWO categorical factors (e.g., drug type AND dosage):
$Y_{ijk} = \mu + \alpha_i + \beta_j + (\alpha\beta){ij} + \epsilon{ijk}$
- Main effects: $\alpha_i$ (factor A), $\beta_j$ (factor B)
- Interaction effect: $(\alpha\beta)_{ij}$ โ does the effect of one factor depend on the level of the other?
The F-test for interaction tests whether the effect of one factor is consistent across levels of the other. If interaction is significant, main effects must be interpreted with caution.
Key Terms
- ANOVA
- F-statistic
- Post-hoc tests
- Two-way ANOVA
Worked Examples
Example 1: One-way ANOVA computation
Three teaching methods tested on students:
| Method A | Method B | Method C |
|---|---|---|
| 78 | 72 | 68 |
| 82 | 75 | 70 |
| 80 | 74 | 69 |
| 79 | 73 | 67 |
$n_A = n_B = n_C = 4$, $N = 12$, $k = 3$
Group means: $\bar{Y}_A = 79.75$, $\bar{Y}_B = 73.5$, $\bar{Y}_C = 68.5$
Grand mean: $\bar{Y} = (79.75 + 73.5 + 68.5)/3 = 221.75/3 = 73.917$
SSB: $4[(79.75-73.917)^2 + (73.5-73.917)^2 + (68.5-73.917)^2]$ $= 4[34.03 + 0.174 + 29.34] = 4 \cdot 63.544 = 254.18$
SSW: Method A: $(78-79.75)^2 + (82-79.75)^2 + (80-79.75)^2 + (79-79.75)^2 = 3.0625+5.0625+0.0625+0.5625=8.75$
Method B: $(72-73.5)^2 + (75-73.5)^2 + (74-73.5)^2 + (73-73.5)^2 = 2.25+2.25+0.25+0.25=5.0$
Method C: $(68-68.5)^2 + (70-68.5)^2 + (69-68.5)^2 + (67-68.5)^2 = 0.25+2.25+0.25+2.25=5.0$
SSW = 8.75 + 5.0 + 5.0 = 18.75
| Source | df | SS | MS | F |
|---|---|---|---|---|
| Between | 2 | 254.18 | 127.09 | $127.09/2.083 = 61.01$ |
| Within | 9 | 18.75 | 2.083 | |
| Total | 11 | 272.93 |
$F_{0.05, 2, 9} \approx 4.26$. Since $61.01 > 4.26$, we reject $H_0$ โ the teaching methods differ significantly.
Example 2: Post-hoc (Bonferroni)
With $k=3$ groups, there are $\binom{3}{2} = 3$ comparisons. Bonferroni-adjusted $\alpha = 0.05/3 = 0.0167$.
Compare A vs B: $|\bar{Y}_A - \bar{Y}_B| = 6.25$ $\text{SE} = \sqrt{2.083(1/4+1/4)} = \sqrt{1.0415} = 1.021$
$t = 6.25/1.021 = 6.12$, $df = 9$, using Bonferroni critical value $t_{0.0083, 9} \approx 2.93$ โ significant.
Similarly, all pairs are significant โ each method differs from every other.
Example 3: Interaction interpretation
Two-way ANOVA on crop yield: Factor A = fertiliser (yes/no), Factor B = water (low/high).
| Low Water | High Water | |
|---|---|---|
| No Fertiliser | 10 | 14 |
| Fertiliser | 12 | 22 |
Main effect of fertiliser (averaged over water): $(12+22)/2 - (10+14)/2 = 17 - 12 = 5$
Main effect of water (averaged over fertiliser): $(14+22)/2 - (10+12)/2 = 18 - 11 = 7$
Interaction: Does fertiliser effect depend on water? - Low water: 12 - 10 = +2 - High water: 22 - 14 = +8
The effect of fertiliser is much larger under high water โ this is an interaction. The main effects alone don't tell the full story.
Quiz
Q1: What does the concept of ANOVA primarily refer to in this subject?
A) A computational error related to ANOVA B) A historical anecdote about ANOVA C) The definition and application of ANOVA D) A visual representation of ANOVA
Correct: C)
- If you chose A: This is incorrect. ANOVA is defined as: the definition and application of anova. The other options describe different aspects that are not the primary focus.
- If you chose B: This is incorrect. ANOVA is defined as: the definition and application of anova. The other options describe different aspects that are not the primary focus.
- If you chose C: ANOVA is defined as: the definition and application of anova. The other options describe different aspects that are not the primary focus. Correct!
- If you chose D: This is incorrect. ANOVA is defined as: the definition and application of anova. The other options describe different aspects that are not the primary focus.
Q2: Which of the following is the key formula discussed in this subject?
A) \alpha = 0.05 B) A simplified version of \alpha = 0.05... C) An unrelated formula from a different topic D) The inverse operation of the formula in question
Correct: A)
- If you chose A: The formula \alpha = 0.05 is central to this subject. The other options are either simplified versions or unrelated. Correct!
- If you chose B: This is incorrect. The formula \alpha = 0.05 is central to this subject. The other options are either simplified versions or unrelated.
- If you chose C: This is incorrect. The formula \alpha = 0.05 is central to this subject. The other options are either simplified versions or unrelated.
- If you chose D: This is incorrect. The formula \alpha = 0.05 is central to this subject. The other options are either simplified versions or unrelated.
Q3: What is the primary purpose of F-statistic?
A) It is primarily a historical notation system B) It is used to f-statistic in mathematical analysis C) It is used only in advanced research contexts D) It replaces all other methods in this domain
Correct: B)
- If you chose A: This is incorrect. F-statistic serves the purpose described in the correct answer. The other options misrepresent its role.
- If you chose B: F-statistic serves the purpose described in the correct answer. The other options misrepresent its role. Correct!
- If you chose C: This is incorrect. F-statistic serves the purpose described in the correct answer. The other options misrepresent its role.
- If you chose D: This is incorrect. F-statistic serves the purpose described in the correct answer. The other options misrepresent its role.
Q4: Which statement about Post-hoc tests is TRUE?
A) Post-hoc tests is a fundamental concept covered in this subject B) Post-hoc tests is an advanced topic beyond this subject's scope C) Post-hoc tests is not related to this subject D) Post-hoc tests is mentioned only as a historical footnote
Correct: A)
- If you chose A: Post-hoc tests is a fundamental concept covered in this subject. This subject covers Post-hoc tests as part of its core content. Correct!
- If you chose B: This is incorrect. Post-hoc tests is a fundamental concept covered in this subject. This subject covers Post-hoc tests as part of its core content.
- If you chose C: This is incorrect. Post-hoc tests is a fundamental concept covered in this subject. This subject covers Post-hoc tests as part of its core content.
- If you chose D: This is incorrect. Post-hoc tests is a fundamental concept covered in this subject. This subject covers Post-hoc tests as part of its core content.
Q5: Based on the worked examples in this subject, what is the correct result?
A) An unrelated numerical value B) A different result from a common mistake C) The inverse of the correct answer D) ** Compares all pairwise differences. A difference
Correct: D)
- If you chose A: This is incorrect. The worked examples show that the result is ** Compares all pairwise differences. A difference. The other options represent common errors.
- If you chose B: This is incorrect. The worked examples show that the result is ** Compares all pairwise differences. A difference. The other options represent common errors.
- If you chose C: This is incorrect. The worked examples show that the result is ** Compares all pairwise differences. A difference. The other options represent common errors.
- If you chose D: The worked examples show that the result is ** Compares all pairwise differences. A difference. The other options represent common errors. Correct!
Q6: How are Post-hoc tests and Two-way ANOVA related?
A) Post-hoc tests is the inverse of Two-way ANOVA B) Post-hoc tests and Two-way ANOVA are completely unrelated topics C) Post-hoc tests is a special case of Two-way ANOVA D) Post-hoc tests and Two-way ANOVA are closely related concepts
Correct: D)
- If you chose A: This is incorrect. Both Post-hoc tests and Two-way ANOVA are covered in this subject as interconnected topics.
- If you chose B: This is incorrect. Both Post-hoc tests and Two-way ANOVA are covered in this subject as interconnected topics.
- If you chose C: This is incorrect. Both Post-hoc tests and Two-way ANOVA are covered in this subject as interconnected topics.
- If you chose D: Both Post-hoc tests and Two-way ANOVA are covered in this subject as interconnected topics. Correct!
Q7: What is a common pitfall when working with โ ๏ธ Critical: Why Anova Instead Of Multiple T-Tests??
A) A common mistake is confusing โ ๏ธ Critical: Why Anova Instead Of Multiple T-Tests? with a similar concept B) โ ๏ธ Critical: Why Anova Instead Of Multiple T-Tests? is always computed the same way in all contexts C) โ ๏ธ Critical: Why Anova Instead Of Multiple T-Tests? has no common misconceptions D) The main error with โ ๏ธ Critical: Why Anova Instead Of Multiple T-Tests? is using it when it is not needed
Correct: A)
- If you chose A: Students often confuse โ ๏ธ Critical: Why Anova Instead Of Multiple T-Tests? with similar-sounding or related concepts. Pay attention to the precise definitions. Correct!
- If you chose B: This is incorrect. Students often confuse โ ๏ธ Critical: Why Anova Instead Of Multiple T-Tests? with similar-sounding or related concepts. Pay attention to the precise definitions.
- If you chose C: This is incorrect. Students often confuse โ ๏ธ Critical: Why Anova Instead Of Multiple T-Tests? with similar-sounding or related concepts. Pay attention to the precise definitions.
- If you chose D: This is incorrect. Students often confuse โ ๏ธ Critical: Why Anova Instead Of Multiple T-Tests? with similar-sounding or related concepts. Pay attention to the precise definitions.
Q8: When should you apply One-Way Anova: The Model?
A) Use One-Way Anova: The Model only in pure mathematics contexts B) Apply One-Way Anova: The Model to solve problems in this subject's domain C) Avoid One-Way Anova: The Model unless explicitly instructed D) One-Way Anova: The Model is not practically useful
Correct: B)
- If you chose A: This is incorrect. One-Way Anova: The Model is a practical tool used throughout this subject to solve relevant problems.
- If you chose B: One-Way Anova: The Model is a practical tool used throughout this subject to solve relevant problems. Correct!
- If you chose C: This is incorrect. One-Way Anova: The Model is a practical tool used throughout this subject to solve relevant problems.
- If you chose D: This is incorrect. One-Way Anova: The Model is a practical tool used throughout this subject to solve relevant problems.
Practice Problems
-
For $k=4$ groups with $n=10$ each, what are the ANOVA degrees of freedom?
Click for answer
Between groups: $k-1 = 3$ Within groups: $N-k = 40-4 = 36$ Total: $N-1 = 39$ F-statistic follows $F_{3, 36}$. -
ANOVA gives $F = 0.87$ with $df = (3, 28)$, $p = 0.47$. Interpret.
Click for answer
$F \approx 1$ suggests MSB and MSW are similar โ there is no evidence that group means differ. $p = 0.47$ is far above any conventional $\alpha$. Fail to reject $H_0$. Do NOT run post-hoc tests โ ANOVA already told you no differences were found. -
Why do we need post-hoc tests after a significant ANOVA?
Click for answer
ANOVA only tells you that AT LEAST ONE pair of means differs โ it doesn't tell you WHICH one(s). Post-hoc tests identify the specific pairs that differ while controlling the family-wise error rate. Running multiple t-tests instead would inflate the Type I error rate. -
You run ANOVA and reject $H_0$. You then run Tukey HSD and find NO significant pairwise differences. Is this possible? How?
Click for answer
Yes, this can happen. Tukey HSD controls the family-wise error rate more strictly than ANOVA's omnibus F-test. The F-test can detect that means are not all equal (e.g., a complex contrast) without any single pairwise comparison reaching significance. It's uncommon but possible, especially with many groups. -
In a two-way ANOVA, the interaction term is significant ($p = 0.003$). How should you interpret the main effects?
Click for answer
When interaction is significant, main effects should be interpreted with caution โ the effect of one factor depends on the level of the other. Rather than reporting "Factor A increases Y by X units," report the simple effects (effect of A at each level of B separately). Plotting the interaction helps visualise the dependence.
Summary
Key takeaways:
- ANOVA tests whether any group means differ, controlling family-wise error rate
- F-statistic = MSB/MSW; $F \approx 1$ under $H_0$, $F > 1$ when groups differ
- SSB + SSW = SST: total variation = between-group + within-group
- Assumptions: independence, normality of residuals, equal variances
- Post-hoc tests (Tukey HSD, Bonferroni) identify WHICH pairs differ
- Two-way ANOVA adds interaction effects โ if significant, interpret simple effects
Pitfalls
- Running post-hoc tests when ANOVA is not significant: Post-hoc pairwise comparisons are only warranted after a significant omnibus F-test. If ANOVA fails to reject Hโ, the data do not support the claim that any group means differ. Fishing for significant pairwise differences anyway inflates the false positive rate โ the omnibus test is the gatekeeper.
- Assuming ANOVA is robust to all assumption violations: While ANOVA is fairly robust to moderate non-normality (especially with balanced designs), it is sensitive to unequal variances when group sizes differ substantially. The classic example: with nโ = 5, nโ = 30 and ฯโยฒ/ฯโยฒ = 4, the actual Type I error rate can be well above 0.05. Check homogeneity of variance, not just normality.
- Interpreting main effects when interaction is significant: In two-way ANOVA, a significant interaction means the effect of one factor depends on the level of the other. Reporting "Factor A increases Y by X units" is misleading when the effect differs across levels of Factor B. Plot the interaction and report simple effects (effect of A at each level of B) instead.
- Using multiple t-tests instead of ANOVA for k โฅ 3 groups: With k = 5 groups, there are 10 pairwise t-tests. Even if all null hypotheses are true, the probability of at least one false positive is 1 โ 0.95ยนโฐ โ 0.40. ANOVA's F-test controls the family-wise error rate at ฮฑ, and post-hoc tests maintain this control while identifying specific differences.
- Believing a significant F-test identifies which groups differ: ANOVA answers only the question "are any group means different?" It does not tell you which ones. The F-statistic can be significant due to a complex contrast (e.g., groups A and B differ from C, but not from each other) even when no single pairwise comparison is significant. Post-hoc tests are essential for pinpointing differences.
Next Steps
Next up: 13-01-entropy.md โ Information Theory begins!