Math graphic
📐 Concept diagram

### 12.8 — Common Tests

Phase: Statistics Prerequisites: 12-07-hypothesis-testing-basics, 12-06-confidence-intervals

Learning Objectives

By the end of this subject, you will be able to:

  1. Select the appropriate test for a given research question
  2. Perform and interpret one-sample and two-sample t-tests
  3. Conduct paired t-tests for matched data
  4. Apply chi-squared tests for categorical data
  5. Use the F-test to compare variances

Core Content

⚠️ CRITICAL: Choosing the Right Test

The test you use depends on: (a) the type of data, (b) the number of groups, and (c) whether samples are independent or paired.

Research Question Test
One sample mean vs known value One-sample t-test
Two independent group means Two-sample (independent) t-test
Paired/matched measurements Paired t-test
Proportions in categories Chi-squared test
Compare two variances F-test for variances

One-Sample t-test

Tests whether a population mean equals a specified value $\mu_0$.

$$t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}, \quad df = n - 1$$

Assumptions: Data are approximately normal (robust for $n \geq 30$ via CLT), observations independent.

Example: Test if the mean IQ of a class differs from 100. $\bar{x} = 107$, $s = 14$, $n = 25$.

$t = \frac{107 - 100}{14/5} = \frac{7}{2.8} = 2.50$, $df = 24$, $p \approx 0.020$ (two-sided)

Reject $H_0$ — the class mean differs from 100.

Two-Sample (Independent) t-test

Tests whether two population means differ. Two versions:

Equal variances (pooled): $$t = \frac{\bar{x}_1 - \bar{x}_2}{s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}, \quad df = n_1 + n_2 - 2$$

Where $s_p^2 = \frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1 + n_2 - 2}$

Unequal variances (Welch's t-test): Preferred default — does not assume equal variances. More robust.

$$t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}$$

with Satterthwaite approximation for df:

$$df \approx \frac{(s_1^2/n_1 + s_2^2/n_2)^2}{\frac{(s_1^2/n_1)^2}{n_1-1} + \frac{(s_2^2/n_2)^2}{n_2-1}}$$

Rule of thumb: Use Welch's t-test by default. Only use pooled when you have strong reason to believe variances are equal AND sample sizes are similar.

Paired t-test

Used when observations come in natural pairs (before/after, matched subjects, left/right).

Compute differences $d_i = x_{i1} - x_{i2}$, then do a one-sample t-test on the differences:

$$t = \frac{\bar{d}}{s_d / \sqrt{n}}, \quad df = n - 1$$

🚩 Common Pitfall: Using an independent t-test on paired data. Paired tests are MORE powerful because they control for between-subject variability. Ignoring pairing wastes statistical power.

Chi-Squared Test

Tests association between categorical variables or goodness-of-fit to a distribution.

Test of independence (contingency table):

$$\chi^2 = \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}}, \quad df = (r-1)(c-1)$$

Where $O_{ij}$ = observed count, $E_{ij}$ = expected count under independence ($E_{ij} = \frac{\text{row}_i \cdot \text{col}_j}{N}$)

Goodness of fit:

$$\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}, \quad df = k - 1$$

Assumption: Expected counts should be $\geq 5$ in each cell for the chi-squared approximation to be valid.

F-test for Variances

$$F = \frac{s_1^2}{s_2^2} \sim F_{n_1-1, n_2-1}$$

Tests whether two normal populations have equal variances. Sensitive to non-normality — Levene's test is a more robust alternative.



Key Terms

Worked Examples

Example 1: Two-sample t-test (Welch's)

Group A: $n_1 = 15$, $\bar{x}_1 = 78$, $s_1 = 8$ Group B: $n_2 = 12$, $\bar{x}_2 = 69$, $s_2 = 12$

$H_0: \mu_1 = \mu_2$ vs $H_a: \mu_1 \neq \mu_2$

$t = \frac{78 - 69}{\sqrt{\frac{64}{15} + \frac{144}{12}}} = \frac{9}{\sqrt{4.267 + 12}} = \frac{9}{\sqrt{16.267}} = \frac{9}{4.033} = 2.23$

$df \approx \frac{16.267^2}{\frac{4.267^2}{14} + \frac{12^2}{11}} = \frac{264.6}{1.301 + 13.091} \approx 18.4$

$p \approx 0.039$ → Reject $H_0$ at $\alpha = 0.05$.

Example 2: Chi-squared test of independence

Survived Died Total
Treatment 38 12 50
Placebo 26 24 50
Total 64 36 100

Under independence: $E_{11} = 64 \cdot 50 / 100 = 32$, $E_{12} = 36 \cdot 50 / 100 = 18$

$\chi^2 = \frac{(38-32)^2}{32} + \frac{(12-18)^2}{18} + \frac{(26-32)^2}{32} + \frac{(24-18)^2}{18}$

$= \frac{36}{32} + \frac{36}{18} + \frac{36}{32} + \frac{36}{18} = 1.125 + 2 + 1.125 + 2 = 6.25$

$df = (2-1)(2-1) = 1$, $p \approx 0.012$ → Significant association.

Example 3: Paired t-test

Five subjects tested before and after training:

Subject Before After Diff ($d$)
1 45 52 +7
2 51 55 +4
3 48 56 +8
4 52 54 +2
5 47 58 +11

$\bar{d} = 32/5 = 6.4$, $s_d = 3.51$

$t = \frac{6.4}{3.51/\sqrt{5}} = \frac{6.4}{1.57} = 4.08$, $df = 4$

$p \approx 0.015$ (two-sided) → Significant improvement.



Quiz

Q1: What does the concept of Chi-squared test primarily refer to in this subject?

A) A historical anecdote about Chi-squared test B) A visual representation of Chi-squared test C) The definition and application of Chi-squared test D) A computational error related to Chi-squared test

Correct: C)

Q2: Which of the following is the key formula discussed in this subject?

A) \mu_0 B) The inverse operation of the formula in question C) A simplified version of \mu_0... D) An unrelated formula from a different topic

Correct: A)

Q3: What is the primary purpose of F-test?

A) It replaces all other methods in this domain B) It is used to f-test in mathematical analysis C) It is used only in advanced research contexts D) It is primarily a historical notation system

Correct: B)

Q4: Which statement about One-sample t-test is TRUE?

A) One-sample t-test is an advanced topic beyond this subject's scope B) One-sample t-test is a fundamental concept covered in this subject C) One-sample t-test is not related to this subject D) One-sample t-test is mentioned only as a historical footnote

Correct: B)

Q5: Based on the worked examples in this subject, what is the correct result?

A) The inverse of the correct answer B) - One-sample t-test: compare one mean to a kno C) A different result from a common mistake D) An unrelated numerical value

Correct: B)

Q6: How are One-sample t-test and Paired t-test related?

A) One-sample t-test and Paired t-test are closely related concepts B) One-sample t-test and Paired t-test are completely unrelated topics C) One-sample t-test is a special case of Paired t-test D) One-sample t-test is the inverse of Paired t-test

Correct: A)

Q7: What is a common pitfall when working with Total?

A) The main error with Total is using it when it is not needed B) Total has no common misconceptions C) A common mistake is confusing Total with a similar concept D) Total is always computed the same way in all contexts

Correct: C)

Q8: When should you apply Two-sample t-test?

A) Use Two-sample t-test only in pure mathematics contexts B) Two-sample t-test is not practically useful C) Apply Two-sample t-test to solve problems in this subject's domain D) Avoid Two-sample t-test unless explicitly instructed

Correct: C)

Practice Problems

  1. When would you use a paired t-test instead of an independent two-sample t-test?

    Click for answer Use paired t-test when each observation in one group is naturally matched to an observation in the other group: before/after measurements on the same subjects, matched pairs by age/gender, left/right comparisons on the same individual. Pairing controls for between-subject variability.

  2. A chi-squared test on a 3×4 contingency table gives $\chi^2 = 18.5$. What are the degrees of freedom?

    Click for answer $df = (r-1)(c-1) = (3-1)(4-1) = 2 \times 3 = 6$

  3. Why is Welch's t-test generally preferred over the pooled t-test?

    Click for answer Welch's test does not assume equal population variances. It performs nearly as well as the pooled test when variances ARE equal, but much better when they're not. The pooled test can have inflated Type I error rates when variances differ, especially with unequal sample sizes.

  4. For a chi-squared test, why is the "expected count ≥ 5" rule important?

    Click for answer The chi-squared distribution is a continuous approximation to the discrete multinomial distribution of cell counts. With small expected counts, the approximation breaks down and p-values become unreliable. For 2×2 tables with small expected counts, use Fisher's exact test instead.

  5. You run an F-test for equal variances and get $F = 2.8$ with $df_1 = 10$, $df_2 = 10$. The critical value at $\alpha = 0.05$ is approximately 2.98. What do you conclude?

    Click for answer $F_{\text{obs}} = 2.8 < F_{\text{crit}} = 2.98$ → Fail to reject $H_0$ (equal variances). The data do not provide sufficient evidence that the variances differ. This would support using the pooled t-test, though Welch's is still safer.


Summary

Key takeaways:


Pitfalls



Next Steps

Next up: 12-09-regression-linear.md