📐 Concept diagram

### 12.8 — Common Tests

Phase: Statistics Prerequisites: 12-07-hypothesis-testing-basics, 12-06-confidence-intervals

Learning Objectives

By the end of this subject, you will be able to:

Select the appropriate test for a given research question
Perform and interpret one-sample and two-sample t-tests
Conduct paired t-tests for matched data
Apply chi-squared tests for categorical data
Use the F-test to compare variances

Core Content

⚠️ CRITICAL: Choosing the Right Test

The test you use depends on: (a) the type of data, (b) the number of groups, and (c) whether samples are independent or paired.

Research Question	Test
One sample mean vs known value	One-sample t-test
Two independent group means	Two-sample (independent) t-test
Paired/matched measurements	Paired t-test
Proportions in categories	Chi-squared test
Compare two variances	F-test for variances

One-Sample t-test

Tests whether a population mean equals a specified value $\mu_0$.

$$t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}, \quad df = n - 1$$

Assumptions: Data are approximately normal (robust for $n \geq 30$ via CLT), observations independent.

Example: Test if the mean IQ of a class differs from 100. $\bar{x} = 107$, $s = 14$, $n = 25$.

$t = \frac{107 - 100}{14/5} = \frac{7}{2.8} = 2.50$, $df = 24$, $p \approx 0.020$ (two-sided)

Reject $H_0$ — the class mean differs from 100.

Two-Sample (Independent) t-test

Tests whether two population means differ. Two versions:

Equal variances (pooled): $$t = \frac{\bar{x}_1 - \bar{x}_2}{s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}, \quad df = n_1 + n_2 - 2$$

Where $s_p^2 = \frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1 + n_2 - 2}$

Unequal variances (Welch's t-test): Preferred default — does not assume equal variances. More robust.

$$t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}$$

with Satterthwaite approximation for df:

$$df \approx \frac{(s_1^2/n_1 + s_2^2/n_2)^2}{\frac{(s_1^2/n_1)^2}{n_1-1} + \frac{(s_2^2/n_2)^2}{n_2-1}}$$

Rule of thumb: Use Welch's t-test by default. Only use pooled when you have strong reason to believe variances are equal AND sample sizes are similar.

Paired t-test

Used when observations come in natural pairs (before/after, matched subjects, left/right).

Compute differences $d_i = x_{i1} - x_{i2}$, then do a one-sample t-test on the differences:

$$t = \frac{\bar{d}}{s_d / \sqrt{n}}, \quad df = n - 1$$

🚩 Common Pitfall: Using an independent t-test on paired data. Paired tests are MORE powerful because they control for between-subject variability. Ignoring pairing wastes statistical power.

Chi-Squared Test

Tests association between categorical variables or goodness-of-fit to a distribution.

Test of independence (contingency table):

$$\chi^2 = \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}}, \quad df = (r-1)(c-1)$$

Where $O_{ij}$ = observed count, $E_{ij}$ = expected count under independence ($E_{ij} = \frac{\text{row}_i \cdot \text{col}_j}{N}$)

Goodness of fit:

$$\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}, \quad df = k - 1$$

Assumption: Expected counts should be $\geq 5$ in each cell for the chi-squared approximation to be valid.

F-test for Variances

$$F = \frac{s_1^2}{s_2^2} \sim F_{n_1-1, n_2-1}$$

Tests whether two normal populations have equal variances. Sensitive to non-normality — Levene's test is a more robust alternative.

Key Terms

Chi-squared test
F-test
One-sample t-test
Paired t-test
Total
Two-sample t-test

Worked Examples

Example 1: Two-sample t-test (Welch's)

Group A: $n_1 = 15$, $\bar{x}_1 = 78$, $s_1 = 8$ Group B: $n_2 = 12$, $\bar{x}_2 = 69$, $s_2 = 12$

$H_0: \mu_1 = \mu_2$ vs $H_a: \mu_1 \neq \mu_2$

$t = \frac{78 - 69}{\sqrt{\frac{64}{15} + \frac{144}{12}}} = \frac{9}{\sqrt{4.267 + 12}} = \frac{9}{\sqrt{16.267}} = \frac{9}{4.033} = 2.23$

$df \approx \frac{16.267^2}{\frac{4.267^2}{14} + \frac{12^2}{11}} = \frac{264.6}{1.301 + 13.091} \approx 18.4$

$p \approx 0.039$ → Reject $H_0$ at $\alpha = 0.05$.

Example 2: Chi-squared test of independence

	Survived	Died	Total
Treatment	38	12	50
Placebo	26	24	50
Total	64	36	100

Under independence: $E_{11} = 64 \cdot 50 / 100 = 32$, $E_{12} = 36 \cdot 50 / 100 = 18$

$\chi^2 = \frac{(38-32)^2}{32} + \frac{(12-18)^2}{18} + \frac{(26-32)^2}{32} + \frac{(24-18)^2}{18}$

$= \frac{36}{32} + \frac{36}{18} + \frac{36}{32} + \frac{36}{18} = 1.125 + 2 + 1.125 + 2 = 6.25$

$df = (2-1)(2-1) = 1$, $p \approx 0.012$ → Significant association.

Example 3: Paired t-test

Five subjects tested before and after training:

Subject	Before	After	Diff ($d$)
1	45	52	+7
2	51	55	+4
3	48	56	+8
4	52	54	+2
5	47	58	+11

$\bar{d} = 32/5 = 6.4$, $s_d = 3.51$

$t = \frac{6.4}{3.51/\sqrt{5}} = \frac{6.4}{1.57} = 4.08$, $df = 4$

$p \approx 0.015$ (two-sided) → Significant improvement.

Quiz

Q1: What does the concept of Chi-squared test primarily refer to in this subject?

A) A historical anecdote about Chi-squared test B) A visual representation of Chi-squared test C) The definition and application of Chi-squared test D) A computational error related to Chi-squared test

Correct: C)

If you chose A: This is incorrect. Chi-squared test is defined as: the definition and application of chi-squared test. The other options describe different aspects that are not the primary focus.
If you chose B: This is incorrect. Chi-squared test is defined as: the definition and application of chi-squared test. The other options describe different aspects that are not the primary focus.
If you chose C: Chi-squared test is defined as: the definition and application of chi-squared test. The other options describe different aspects that are not the primary focus. Correct!
If you chose D: This is incorrect. Chi-squared test is defined as: the definition and application of chi-squared test. The other options describe different aspects that are not the primary focus.

Q2: Which of the following is the key formula discussed in this subject?

A) \mu_0 B) The inverse operation of the formula in question C) A simplified version of \mu_0... D) An unrelated formula from a different topic

Correct: A)

If you chose A: The formula \mu_0 is central to this subject. The other options are either simplified versions or unrelated. Correct!
If you chose B: This is incorrect. The formula \mu_0 is central to this subject. The other options are either simplified versions or unrelated.
If you chose C: This is incorrect. The formula \mu_0 is central to this subject. The other options are either simplified versions or unrelated.
If you chose D: This is incorrect. The formula \mu_0 is central to this subject. The other options are either simplified versions or unrelated.

Q3: What is the primary purpose of F-test?

A) It replaces all other methods in this domain B) It is used to f-test in mathematical analysis C) It is used only in advanced research contexts D) It is primarily a historical notation system

Correct: B)

If you chose A: This is incorrect. F-test serves the purpose described in the correct answer. The other options misrepresent its role.
If you chose B: F-test serves the purpose described in the correct answer. The other options misrepresent its role. Correct!
If you chose C: This is incorrect. F-test serves the purpose described in the correct answer. The other options misrepresent its role.
If you chose D: This is incorrect. F-test serves the purpose described in the correct answer. The other options misrepresent its role.

Q4: Which statement about One-sample t-test is TRUE?

A) One-sample t-test is an advanced topic beyond this subject's scope B) One-sample t-test is a fundamental concept covered in this subject C) One-sample t-test is not related to this subject D) One-sample t-test is mentioned only as a historical footnote

Correct: B)

If you chose A: This is incorrect. One-sample t-test is a fundamental concept covered in this subject. This subject covers One-sample t-test as part of its core content.
If you chose B: One-sample t-test is a fundamental concept covered in this subject. This subject covers One-sample t-test as part of its core content. Correct!
If you chose C: This is incorrect. One-sample t-test is a fundamental concept covered in this subject. This subject covers One-sample t-test as part of its core content.
If you chose D: This is incorrect. One-sample t-test is a fundamental concept covered in this subject. This subject covers One-sample t-test as part of its core content.

Q5: Based on the worked examples in this subject, what is the correct result?

A) The inverse of the correct answer B) - One-sample t-test: compare one mean to a kno C) A different result from a common mistake D) An unrelated numerical value

Correct: B)

If you chose A: This is incorrect. The worked examples show that the result is - One-sample t-test: compare one mean to a kno. The other options represent common errors.
If you chose B: The worked examples show that the result is - One-sample t-test: compare one mean to a kno. The other options represent common errors. Correct!
If you chose C: This is incorrect. The worked examples show that the result is - One-sample t-test: compare one mean to a kno. The other options represent common errors.
If you chose D: This is incorrect. The worked examples show that the result is - One-sample t-test: compare one mean to a kno. The other options represent common errors.

Q6: How are One-sample t-test and Paired t-test related?

A) One-sample t-test and Paired t-test are closely related concepts B) One-sample t-test and Paired t-test are completely unrelated topics C) One-sample t-test is a special case of Paired t-test D) One-sample t-test is the inverse of Paired t-test

Correct: A)

If you chose A: Both One-sample t-test and Paired t-test are covered in this subject as interconnected topics. Correct!
If you chose B: This is incorrect. Both One-sample t-test and Paired t-test are covered in this subject as interconnected topics.
If you chose C: This is incorrect. Both One-sample t-test and Paired t-test are covered in this subject as interconnected topics.
If you chose D: This is incorrect. Both One-sample t-test and Paired t-test are covered in this subject as interconnected topics.

Q7: What is a common pitfall when working with Total?

A) The main error with Total is using it when it is not needed B) Total has no common misconceptions C) A common mistake is confusing Total with a similar concept D) Total is always computed the same way in all contexts

Correct: C)

If you chose A: This is incorrect. Students often confuse Total with similar-sounding or related concepts. Pay attention to the precise definitions.
If you chose B: This is incorrect. Students often confuse Total with similar-sounding or related concepts. Pay attention to the precise definitions.
If you chose C: Students often confuse Total with similar-sounding or related concepts. Pay attention to the precise definitions. Correct!
If you chose D: This is incorrect. Students often confuse Total with similar-sounding or related concepts. Pay attention to the precise definitions.

Q8: When should you apply Two-sample t-test?

A) Use Two-sample t-test only in pure mathematics contexts B) Two-sample t-test is not practically useful C) Apply Two-sample t-test to solve problems in this subject's domain D) Avoid Two-sample t-test unless explicitly instructed

Correct: C)

If you chose A: This is incorrect. Two-sample t-test is a practical tool used throughout this subject to solve relevant problems.
If you chose B: This is incorrect. Two-sample t-test is a practical tool used throughout this subject to solve relevant problems.
If you chose C: Two-sample t-test is a practical tool used throughout this subject to solve relevant problems. Correct!
If you chose D: This is incorrect. Two-sample t-test is a practical tool used throughout this subject to solve relevant problems.

Practice Problems

When would you use a paired t-test instead of an independent two-sample t-test?

Click for answer
Use paired t-test when each observation in one group is naturally matched to an observation in the other group: before/after measurements on the same subjects, matched pairs by age/gender, left/right comparisons on the same individual. Pairing controls for between-subject variability.
A chi-squared test on a 3×4 contingency table gives $\chi^2 = 18.5$. What are the degrees of freedom?

Click for answer
$df = (r-1)(c-1) = (3-1)(4-1) = 2 \times 3 = 6$
Why is Welch's t-test generally preferred over the pooled t-test?

Click for answer
Welch's test does not assume equal population variances. It performs nearly as well as the pooled test when variances ARE equal, but much better when they're not. The pooled test can have inflated Type I error rates when variances differ, especially with unequal sample sizes.
For a chi-squared test, why is the "expected count ≥ 5" rule important?

Click for answer
The chi-squared distribution is a continuous approximation to the discrete multinomial distribution of cell counts. With small expected counts, the approximation breaks down and p-values become unreliable. For 2×2 tables with small expected counts, use Fisher's exact test instead.
You run an F-test for equal variances and get $F = 2.8$ with $df_1 = 10$, $df_2 = 10$. The critical value at $\alpha = 0.05$ is approximately 2.98. What do you conclude?

Click for answer
$F_{\text{obs}} = 2.8 < F_{\text{crit}} = 2.98$ → Fail to reject $H_0$ (equal variances). The data do not provide sufficient evidence that the variances differ. This would support using the pooled t-test, though Welch's is still safer.

Summary

Key takeaways:

One-sample t-test: compare one mean to a known value
Two-sample t-test: compare two independent group means; prefer Welch's (unequal variances)
Paired t-test: compare matched pairs; more powerful than independent test for paired designs
Chi-squared test: test association in categorical data or goodness-of-fit; requires expected counts $\geq$ 5
F-test: compare variances of two normal populations
Always check test assumptions before applying

Pitfalls

Using an independent t-test on paired data: When observations come in natural pairs (before/after, matched subjects, twins), a paired t-test controls for between-subject variability. Using an independent test on paired data discards this pairing, inflates the error variance, and reduces statistical power — potentially masking real effects.
Applying chi-squared tests with small expected counts: The chi-squared distribution approximates the discrete distribution of cell counts only when expected frequencies are ≥ 5 per cell. With smaller expected counts, p-values become unreliable. For 2×2 tables with small counts, use Fisher's exact test instead.
Defaulting to the pooled t-test without checking variance equality: The pooled t-test assumes σ₁² = σ₂². When this assumption is violated, especially with unequal sample sizes, the actual Type I error rate can be substantially above or below the nominal level. Welch's t-test does not assume equal variances and should be the default choice.
Treating the F-test for variances as robust: The standard F-test for comparing two variances is extremely sensitive to non-normality — even modest departures from normality can produce wildly inaccurate p-values. Levene's test or the Brown-Forsythe test are much more robust alternatives.
Running multiple pairwise t-tests instead of ANOVA: With k = 4 groups, there are 6 pairwise comparisons. Testing each at α = 0.05 gives a family-wise error rate of 1 − 0.95⁶ ≈ 0.26. The omnibus F-test in ANOVA controls this rate, and post-hoc tests (Tukey HSD, Bonferroni) maintain control while identifying which specific pairs differ.

Next Steps

Next up: 12-09-regression-linear.md

Progress

Phases

### 12.8 — Common Tests

Learning Objectives

Core Content

⚠️ CRITICAL: Choosing the Right Test

One-Sample t-test

Two-Sample (Independent) t-test

Paired t-test

Chi-Squared Test

F-test for Variances

Key Terms

Worked Examples

Example 1: Two-sample t-test (Welch's)

Example 2: Chi-squared test of independence

Example 3: Paired t-test

Quiz

Practice Problems

Summary

Pitfalls

Next Steps