### 12.8 — Common Tests
Phase: Statistics Prerequisites: 12-07-hypothesis-testing-basics, 12-06-confidence-intervals
Learning Objectives
By the end of this subject, you will be able to:
- Select the appropriate test for a given research question
- Perform and interpret one-sample and two-sample t-tests
- Conduct paired t-tests for matched data
- Apply chi-squared tests for categorical data
- Use the F-test to compare variances
Core Content
⚠️ CRITICAL: Choosing the Right Test
The test you use depends on: (a) the type of data, (b) the number of groups, and (c) whether samples are independent or paired.
| Research Question | Test |
|---|---|
| One sample mean vs known value | One-sample t-test |
| Two independent group means | Two-sample (independent) t-test |
| Paired/matched measurements | Paired t-test |
| Proportions in categories | Chi-squared test |
| Compare two variances | F-test for variances |
One-Sample t-test
Tests whether a population mean equals a specified value $\mu_0$.
$$t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}, \quad df = n - 1$$
Assumptions: Data are approximately normal (robust for $n \geq 30$ via CLT), observations independent.
Example: Test if the mean IQ of a class differs from 100. $\bar{x} = 107$, $s = 14$, $n = 25$.
$t = \frac{107 - 100}{14/5} = \frac{7}{2.8} = 2.50$, $df = 24$, $p \approx 0.020$ (two-sided)
Reject $H_0$ — the class mean differs from 100.
Two-Sample (Independent) t-test
Tests whether two population means differ. Two versions:
Equal variances (pooled): $$t = \frac{\bar{x}_1 - \bar{x}_2}{s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}, \quad df = n_1 + n_2 - 2$$
Where $s_p^2 = \frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1 + n_2 - 2}$
Unequal variances (Welch's t-test): Preferred default — does not assume equal variances. More robust.
$$t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}$$
with Satterthwaite approximation for df:
$$df \approx \frac{(s_1^2/n_1 + s_2^2/n_2)^2}{\frac{(s_1^2/n_1)^2}{n_1-1} + \frac{(s_2^2/n_2)^2}{n_2-1}}$$
Rule of thumb: Use Welch's t-test by default. Only use pooled when you have strong reason to believe variances are equal AND sample sizes are similar.
Paired t-test
Used when observations come in natural pairs (before/after, matched subjects, left/right).
Compute differences $d_i = x_{i1} - x_{i2}$, then do a one-sample t-test on the differences:
$$t = \frac{\bar{d}}{s_d / \sqrt{n}}, \quad df = n - 1$$
🚩 Common Pitfall: Using an independent t-test on paired data. Paired tests are MORE powerful because they control for between-subject variability. Ignoring pairing wastes statistical power.
Chi-Squared Test
Tests association between categorical variables or goodness-of-fit to a distribution.
Test of independence (contingency table):
$$\chi^2 = \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}}, \quad df = (r-1)(c-1)$$
Where $O_{ij}$ = observed count, $E_{ij}$ = expected count under independence ($E_{ij} = \frac{\text{row}_i \cdot \text{col}_j}{N}$)
Goodness of fit:
$$\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}, \quad df = k - 1$$
Assumption: Expected counts should be $\geq 5$ in each cell for the chi-squared approximation to be valid.
F-test for Variances
$$F = \frac{s_1^2}{s_2^2} \sim F_{n_1-1, n_2-1}$$
Tests whether two normal populations have equal variances. Sensitive to non-normality — Levene's test is a more robust alternative.
Key Terms
- Chi-squared test
- F-test
- One-sample t-test
- Paired t-test
- Total
- Two-sample t-test
Worked Examples
Example 1: Two-sample t-test (Welch's)
Group A: $n_1 = 15$, $\bar{x}_1 = 78$, $s_1 = 8$ Group B: $n_2 = 12$, $\bar{x}_2 = 69$, $s_2 = 12$
$H_0: \mu_1 = \mu_2$ vs $H_a: \mu_1 \neq \mu_2$
$t = \frac{78 - 69}{\sqrt{\frac{64}{15} + \frac{144}{12}}} = \frac{9}{\sqrt{4.267 + 12}} = \frac{9}{\sqrt{16.267}} = \frac{9}{4.033} = 2.23$
$df \approx \frac{16.267^2}{\frac{4.267^2}{14} + \frac{12^2}{11}} = \frac{264.6}{1.301 + 13.091} \approx 18.4$
$p \approx 0.039$ → Reject $H_0$ at $\alpha = 0.05$.
Example 2: Chi-squared test of independence
| Survived | Died | Total | |
|---|---|---|---|
| Treatment | 38 | 12 | 50 |
| Placebo | 26 | 24 | 50 |
| Total | 64 | 36 | 100 |
Under independence: $E_{11} = 64 \cdot 50 / 100 = 32$, $E_{12} = 36 \cdot 50 / 100 = 18$
$\chi^2 = \frac{(38-32)^2}{32} + \frac{(12-18)^2}{18} + \frac{(26-32)^2}{32} + \frac{(24-18)^2}{18}$
$= \frac{36}{32} + \frac{36}{18} + \frac{36}{32} + \frac{36}{18} = 1.125 + 2 + 1.125 + 2 = 6.25$
$df = (2-1)(2-1) = 1$, $p \approx 0.012$ → Significant association.
Example 3: Paired t-test
Five subjects tested before and after training:
| Subject | Before | After | Diff ($d$) |
|---|---|---|---|
| 1 | 45 | 52 | +7 |
| 2 | 51 | 55 | +4 |
| 3 | 48 | 56 | +8 |
| 4 | 52 | 54 | +2 |
| 5 | 47 | 58 | +11 |
$\bar{d} = 32/5 = 6.4$, $s_d = 3.51$
$t = \frac{6.4}{3.51/\sqrt{5}} = \frac{6.4}{1.57} = 4.08$, $df = 4$
$p \approx 0.015$ (two-sided) → Significant improvement.
Quiz
Q1: What does the concept of Chi-squared test primarily refer to in this subject?
A) A historical anecdote about Chi-squared test B) A visual representation of Chi-squared test C) The definition and application of Chi-squared test D) A computational error related to Chi-squared test
Correct: C)
- If you chose A: This is incorrect. Chi-squared test is defined as: the definition and application of chi-squared test. The other options describe different aspects that are not the primary focus.
- If you chose B: This is incorrect. Chi-squared test is defined as: the definition and application of chi-squared test. The other options describe different aspects that are not the primary focus.
- If you chose C: Chi-squared test is defined as: the definition and application of chi-squared test. The other options describe different aspects that are not the primary focus. Correct!
- If you chose D: This is incorrect. Chi-squared test is defined as: the definition and application of chi-squared test. The other options describe different aspects that are not the primary focus.
Q2: Which of the following is the key formula discussed in this subject?
A) \mu_0 B) The inverse operation of the formula in question C) A simplified version of \mu_0... D) An unrelated formula from a different topic
Correct: A)
- If you chose A: The formula \mu_0 is central to this subject. The other options are either simplified versions or unrelated. Correct!
- If you chose B: This is incorrect. The formula \mu_0 is central to this subject. The other options are either simplified versions or unrelated.
- If you chose C: This is incorrect. The formula \mu_0 is central to this subject. The other options are either simplified versions or unrelated.
- If you chose D: This is incorrect. The formula \mu_0 is central to this subject. The other options are either simplified versions or unrelated.
Q3: What is the primary purpose of F-test?
A) It replaces all other methods in this domain B) It is used to f-test in mathematical analysis C) It is used only in advanced research contexts D) It is primarily a historical notation system
Correct: B)
- If you chose A: This is incorrect. F-test serves the purpose described in the correct answer. The other options misrepresent its role.
- If you chose B: F-test serves the purpose described in the correct answer. The other options misrepresent its role. Correct!
- If you chose C: This is incorrect. F-test serves the purpose described in the correct answer. The other options misrepresent its role.
- If you chose D: This is incorrect. F-test serves the purpose described in the correct answer. The other options misrepresent its role.
Q4: Which statement about One-sample t-test is TRUE?
A) One-sample t-test is an advanced topic beyond this subject's scope B) One-sample t-test is a fundamental concept covered in this subject C) One-sample t-test is not related to this subject D) One-sample t-test is mentioned only as a historical footnote
Correct: B)
- If you chose A: This is incorrect. One-sample t-test is a fundamental concept covered in this subject. This subject covers One-sample t-test as part of its core content.
- If you chose B: One-sample t-test is a fundamental concept covered in this subject. This subject covers One-sample t-test as part of its core content. Correct!
- If you chose C: This is incorrect. One-sample t-test is a fundamental concept covered in this subject. This subject covers One-sample t-test as part of its core content.
- If you chose D: This is incorrect. One-sample t-test is a fundamental concept covered in this subject. This subject covers One-sample t-test as part of its core content.
Q5: Based on the worked examples in this subject, what is the correct result?
A) The inverse of the correct answer B) - One-sample t-test: compare one mean to a kno C) A different result from a common mistake D) An unrelated numerical value
Correct: B)
- If you chose A: This is incorrect. The worked examples show that the result is - One-sample t-test: compare one mean to a kno. The other options represent common errors.
- If you chose B: The worked examples show that the result is - One-sample t-test: compare one mean to a kno. The other options represent common errors. Correct!
- If you chose C: This is incorrect. The worked examples show that the result is - One-sample t-test: compare one mean to a kno. The other options represent common errors.
- If you chose D: This is incorrect. The worked examples show that the result is - One-sample t-test: compare one mean to a kno. The other options represent common errors.
Q6: How are One-sample t-test and Paired t-test related?
A) One-sample t-test and Paired t-test are closely related concepts B) One-sample t-test and Paired t-test are completely unrelated topics C) One-sample t-test is a special case of Paired t-test D) One-sample t-test is the inverse of Paired t-test
Correct: A)
- If you chose A: Both One-sample t-test and Paired t-test are covered in this subject as interconnected topics. Correct!
- If you chose B: This is incorrect. Both One-sample t-test and Paired t-test are covered in this subject as interconnected topics.
- If you chose C: This is incorrect. Both One-sample t-test and Paired t-test are covered in this subject as interconnected topics.
- If you chose D: This is incorrect. Both One-sample t-test and Paired t-test are covered in this subject as interconnected topics.
Q7: What is a common pitfall when working with Total?
A) The main error with Total is using it when it is not needed B) Total has no common misconceptions C) A common mistake is confusing Total with a similar concept D) Total is always computed the same way in all contexts
Correct: C)
- If you chose A: This is incorrect. Students often confuse Total with similar-sounding or related concepts. Pay attention to the precise definitions.
- If you chose B: This is incorrect. Students often confuse Total with similar-sounding or related concepts. Pay attention to the precise definitions.
- If you chose C: Students often confuse Total with similar-sounding or related concepts. Pay attention to the precise definitions. Correct!
- If you chose D: This is incorrect. Students often confuse Total with similar-sounding or related concepts. Pay attention to the precise definitions.
Q8: When should you apply Two-sample t-test?
A) Use Two-sample t-test only in pure mathematics contexts B) Two-sample t-test is not practically useful C) Apply Two-sample t-test to solve problems in this subject's domain D) Avoid Two-sample t-test unless explicitly instructed
Correct: C)
- If you chose A: This is incorrect. Two-sample t-test is a practical tool used throughout this subject to solve relevant problems.
- If you chose B: This is incorrect. Two-sample t-test is a practical tool used throughout this subject to solve relevant problems.
- If you chose C: Two-sample t-test is a practical tool used throughout this subject to solve relevant problems. Correct!
- If you chose D: This is incorrect. Two-sample t-test is a practical tool used throughout this subject to solve relevant problems.
Practice Problems
-
When would you use a paired t-test instead of an independent two-sample t-test?
Click for answer
Use paired t-test when each observation in one group is naturally matched to an observation in the other group: before/after measurements on the same subjects, matched pairs by age/gender, left/right comparisons on the same individual. Pairing controls for between-subject variability. -
A chi-squared test on a 3×4 contingency table gives $\chi^2 = 18.5$. What are the degrees of freedom?
Click for answer
$df = (r-1)(c-1) = (3-1)(4-1) = 2 \times 3 = 6$ -
Why is Welch's t-test generally preferred over the pooled t-test?
Click for answer
Welch's test does not assume equal population variances. It performs nearly as well as the pooled test when variances ARE equal, but much better when they're not. The pooled test can have inflated Type I error rates when variances differ, especially with unequal sample sizes. -
For a chi-squared test, why is the "expected count ≥ 5" rule important?
Click for answer
The chi-squared distribution is a continuous approximation to the discrete multinomial distribution of cell counts. With small expected counts, the approximation breaks down and p-values become unreliable. For 2×2 tables with small expected counts, use Fisher's exact test instead. -
You run an F-test for equal variances and get $F = 2.8$ with $df_1 = 10$, $df_2 = 10$. The critical value at $\alpha = 0.05$ is approximately 2.98. What do you conclude?
Click for answer
$F_{\text{obs}} = 2.8 < F_{\text{crit}} = 2.98$ → Fail to reject $H_0$ (equal variances). The data do not provide sufficient evidence that the variances differ. This would support using the pooled t-test, though Welch's is still safer.
Summary
Key takeaways:
- One-sample t-test: compare one mean to a known value
- Two-sample t-test: compare two independent group means; prefer Welch's (unequal variances)
- Paired t-test: compare matched pairs; more powerful than independent test for paired designs
- Chi-squared test: test association in categorical data or goodness-of-fit; requires expected counts $\geq$ 5
- F-test: compare variances of two normal populations
- Always check test assumptions before applying
Pitfalls
- Using an independent t-test on paired data: When observations come in natural pairs (before/after, matched subjects, twins), a paired t-test controls for between-subject variability. Using an independent test on paired data discards this pairing, inflates the error variance, and reduces statistical power — potentially masking real effects.
- Applying chi-squared tests with small expected counts: The chi-squared distribution approximates the discrete distribution of cell counts only when expected frequencies are ≥ 5 per cell. With smaller expected counts, p-values become unreliable. For 2×2 tables with small counts, use Fisher's exact test instead.
- Defaulting to the pooled t-test without checking variance equality: The pooled t-test assumes σ₁² = σ₂². When this assumption is violated, especially with unequal sample sizes, the actual Type I error rate can be substantially above or below the nominal level. Welch's t-test does not assume equal variances and should be the default choice.
- Treating the F-test for variances as robust: The standard F-test for comparing two variances is extremely sensitive to non-normality — even modest departures from normality can produce wildly inaccurate p-values. Levene's test or the Brown-Forsythe test are much more robust alternatives.
- Running multiple pairwise t-tests instead of ANOVA: With k = 4 groups, there are 6 pairwise comparisons. Testing each at α = 0.05 gives a family-wise error rate of 1 − 0.95⁶ ≈ 0.26. The omnibus F-test in ANOVA controls this rate, and post-hoc tests (Tukey HSD, Bonferroni) maintain control while identifying which specific pairs differ.
Next Steps
Next up: 12-09-regression-linear.md