Math graphic
๐Ÿ“ Concept diagram

### 12.6 โ€” Confidence Intervals

Phase: Statistics Prerequisites: 12-02-sampling-sampling-distributions, 12-03-point-estimation

Learning Objectives

By the end of this subject, you will be able to:

  1. Interpret confidence intervals correctly (avoiding the most common misinterpretation)
  2. Construct and compute z-intervals for means ($\sigma$ known)
  3. Construct and compute t-intervals for means ($\sigma$ unknown)
  4. Construct confidence intervals for proportions and differences of means
  5. Determine required sample size for a desired margin of error

Core Content

โš ๏ธ CRITICAL: What a Confidence Interval Actually Means

A 95% confidence interval DOES NOT mean "there is a 95% probability that the true parameter lies in this interval."

The parameter is fixed โ€” it either is or is not in the interval. The 95% refers to the procedure: if we repeated the sampling process many times, 95% of the resulting intervals would contain the true parameter value.

๐Ÿšฉ Common Pitfall: Students (and many practising scientists) wrongly interpret a 95% CI as "95% chance the parameter is in this interval." That's a Bayesian credible interval, not a frequentist confidence interval.

The correct interpretation: "We are 95% confident that the interval [L, U] captures the true parameter" โ€” where "confident" means the method has 95% coverage in repeated sampling.

CI for Population Mean ($\sigma$ Known) โ€” z-interval

When $\sigma$ is known (rare in practice but theoretically important):

$$\bar{x} \pm z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}$$

Where $z_{\alpha/2}$ is the critical value from the standard normal: - 90% CI: $z_{0.05} = 1.645$ - 95% CI: $z_{0.025} = 1.96$ - 99% CI: $z_{0.005} = 2.576$

Margin of error: $m = z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}$

CI for Population Mean ($\sigma$ Unknown) โ€” t-interval

In reality, $\sigma$ is unknown. We estimate it with $s$ and use the t-distribution:

$$\bar{x} \pm t_{\alpha/2, n-1} \cdot \frac{s}{\sqrt{n}}$$

With $n-1$ degrees of freedom.

Example: 16 measurements give $\bar{x} = 48.3$, $s = 6.1$. For a 95% CI:

$t_{0.025, 15} = 2.131$

$48.3 \pm 2.131 \cdot \frac{6.1}{\sqrt{16}} = 48.3 \pm 2.131 \cdot 1.525 = 48.3 \pm 3.25$

95% CI: $[45.05, 51.55]$

CI for a Proportion

For a sample proportion $\hat{p} = k/n$:

$$\hat{p} \pm z_{\alpha/2} \cdot \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}}$$

Validity condition: Need $n\hat{p} \geq 10$ AND $n(1-\hat{p}) \geq 10$ (at least 10 expected successes and failures).

CI for Difference of Means

Independent samples (equal variances assumed):

$$(\bar{x}1 - \bar{x}_2) \pm t{\alpha/2, df} \cdot s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}$$

Where $s_p^2 = \frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1 + n_2 - 2}$ is the pooled variance, and $df = n_1 + n_2 - 2$.

Sample Size Determination

To achieve a desired margin of error $m$ with confidence $1-\alpha$:

For a mean: $n = \left(\frac{z_{\alpha/2} \cdot \sigma}{m}\right)^2$

For a proportion: $n = \hat{p}(1-\hat{p})\left(\frac{z_{\alpha/2}}{m}\right)^2$

When $\hat{p}$ is unknown, use $\hat{p} = 0.5$ (worst case, gives largest $n$).



Key Terms

Worked Examples

Example 1: t-interval construction

A sample of 25 screws has $\bar{x} = 12.4$ mm length and $s = 0.8$ mm. Construct a 99% CI for the true mean length.

$t_{0.005, 24} = 2.797$

$\text{SE} = 0.8 / 5 = 0.16$

$12.4 \pm 2.797 \cdot 0.16 = 12.4 \pm 0.448$

99% CI: $[11.95, 12.85]$ mm

Example 2: Proportion CI

In a poll of 400 voters, 220 support Candidate A. Give a 95% CI for the true proportion.

$\hat{p} = 220/400 = 0.55$

$\text{SE}(\hat{p}) = \sqrt{\frac{0.55 \cdot 0.45}{400}} = \sqrt{\frac{0.2475}{400}} = \sqrt{0.00061875} = 0.02487$

$0.55 \pm 1.96 \cdot 0.02487 = 0.55 \pm 0.0488$

95% CI: $[0.501, 0.599]$ or $[50.1\%, 59.9\%]$

Check conditions: $n\hat{p} = 220 \geq 10$ โœ“, $n(1-\hat{p}) = 180 \geq 10$ โœ“

Example 3: Sample size for desired precision

You want to estimate a population mean to within $\pm 2$ units with 95% confidence. Previous studies suggest $\sigma \approx 8$. How many observations are needed?

$n = \left(\frac{z_{0.025} \cdot \sigma}{m}\right)^2 = \left(\frac{1.96 \cdot 8}{2}\right)^2 = (7.84)^2 = 61.47$

Round up to $n = 62$ observations.



Quiz

Q1: What does the concept of CI for Difference of Means primarily refer to in this subject?

A) A historical anecdote about CI for Difference of Means B) The definition and application of CI for Difference of Means C) A visual representation of CI for Difference of Means D) A computational error related to CI for Difference of Means

Correct: B)

Q2: Which of the following is the key formula discussed in this subject?

A) An unrelated formula from a different topic B) The inverse operation of the formula in question C) \sigma D) A simplified version of \sigma...

Correct: C)

Q3: What is the primary purpose of CI for a Proportion?

A) It is primarily a historical notation system B) It is used only in advanced research contexts C) It is used to ci for a proportion in mathematical analysis D) It replaces all other methods in this domain

Correct: C)

Q4: Which statement about CI width is TRUE?

A) CI width is an advanced topic beyond this subject's scope B) CI width is a fundamental concept covered in this subject C) CI width is mentioned only as a historical footnote D) CI width is not related to this subject

Correct: B)

Q5: Based on the worked examples in this subject, what is the correct result?

A) $[46.53, 49.87]$ hours B) An unrelated numerical value C) A different result from a common mistake D) The inverse of the correct answer

Correct: A)

Q6: How are CI width and โš ๏ธ Critical: What A Confidence Interval Actually Means related?

A) CI width is a special case of โš ๏ธ Critical: What A Confidence Interval Actually Means B) CI width and โš ๏ธ Critical: What A Confidence Interval Actually Means are completely unrelated topics C) CI width and โš ๏ธ Critical: What A Confidence Interval Actually Means are closely related concepts D) CI width is the inverse of โš ๏ธ Critical: What A Confidence Interval Actually Means

Correct: C)

Q7: What is a common pitfall when working with Ci For Population Mean ($\Sigma$ Known) โ€” Z-Interval?

A) A common mistake is confusing Ci For Population Mean ($\Sigma$ Known) โ€” Z-Interval with a similar concept B) Ci For Population Mean ($\Sigma$ Known) โ€” Z-Interval has no common misconceptions C) The main error with Ci For Population Mean ($\Sigma$ Known) โ€” Z-Interval is using it when it is not needed D) Ci For Population Mean ($\Sigma$ Known) โ€” Z-Interval is always computed the same way in all contexts

Correct: A)

Q8: When should you apply Ci For Population Mean ($\Sigma$ Unknown) โ€” T-Interval?

A) Ci For Population Mean ($\Sigma$ Unknown) โ€” T-Interval is not practically useful B) Use Ci For Population Mean ($\Sigma$ Unknown) โ€” T-Interval only in pure mathematics contexts C) Avoid Ci For Population Mean ($\Sigma$ Unknown) โ€” T-Interval unless explicitly instructed D) Apply Ci For Population Mean ($\Sigma$ Unknown) โ€” T-Interval to solve problems in this subject's domain

Correct: D)

Practice Problems

  1. A sample of 36 batteries has $\bar{x} = 48.2$ hours, $s = 5.1$ hours. Construct a 95% CI for the mean battery life.
Click for answer $t_{0.025, 35} \approx 2.030$ (use z = 1.96 as approximation for df โ‰ฅ 30) $48.2 \pm 1.96 \cdot \frac{5.1}{6} = 48.2 \pm 1.96 \cdot 0.85 = 48.2 \pm 1.666$ 95% CI: $[46.53, 49.87]$ hours
  1. Which gives a wider CI: 90% confidence or 99% confidence? Why?

    Click for answer 99% CI is wider. Higher confidence requires capturing the parameter more reliably, so the interval must be broader. The critical value increases: $z_{0.05} = 1.645$ vs $z_{0.005} = 2.576$.

  2. If you quadruple the sample size, what happens to the width of a z-interval?

    Click for answer Width $\propto 1/\sqrt{n}$. Quadrupling $n$ multiplies the width by $1/\sqrt{4} = 1/2$. The CI width is halved.

  3. For a proportion CI, why do we use $\hat{p} = 0.5$ when planning sample size?

    Click for answer The variance of a proportion is $\hat{p}(1-\hat{p})$, which is maximised at $\hat{p} = 0.5$. Using 0.5 gives the largest (most conservative) sample size estimate โ€” guaranteeing sufficient sample regardless of the true proportion.

  4. A 95% CI for $\mu$ is $[12.3, 18.7]$. Does this mean there is a 95% probability that $\mu$ is between 12.3 and 18.7?

    Click for answer **No.** In the frequentist framework, $\mu$ is a fixed constant โ€” it is either in $[12.3, 18.7]$ (probability 1) or it is not (probability 0). The 95% refers to the long-run coverage of the method: if we repeated the experiment many times, 95% of the resulting intervals would contain $\mu$. The correct statement is: "We are 95% confident that $\mu$ lies in $[12.3, 18.7]$."


Summary

Key takeaways:


Pitfalls



Next Steps

Next up: 12-07-hypothesis-testing-basics.md