📐 Concept diagram

### 12.6 — Confidence Intervals

Phase: Statistics Prerequisites: 12-02-sampling-sampling-distributions, 12-03-point-estimation

Learning Objectives

By the end of this subject, you will be able to:

Interpret confidence intervals correctly (avoiding the most common misinterpretation)
Construct and compute z-intervals for means ($\sigma$ known)
Construct and compute t-intervals for means ($\sigma$ unknown)
Construct confidence intervals for proportions and differences of means
Determine required sample size for a desired margin of error

Core Content

⚠️ CRITICAL: What a Confidence Interval Actually Means

A 95% confidence interval DOES NOT mean "there is a 95% probability that the true parameter lies in this interval."

The parameter is fixed — it either is or is not in the interval. The 95% refers to the procedure: if we repeated the sampling process many times, 95% of the resulting intervals would contain the true parameter value.

🚩 Common Pitfall: Students (and many practising scientists) wrongly interpret a 95% CI as "95% chance the parameter is in this interval." That's a Bayesian credible interval, not a frequentist confidence interval.

The correct interpretation: "We are 95% confident that the interval [L, U] captures the true parameter" — where "confident" means the method has 95% coverage in repeated sampling.

CI for Population Mean ($\sigma$ Known) — z-interval

When $\sigma$ is known (rare in practice but theoretically important):

$$\bar{x} \pm z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}$$

Where $z_{\alpha/2}$ is the critical value from the standard normal: - 90% CI: $z_{0.05} = 1.645$ - 95% CI: $z_{0.025} = 1.96$ - 99% CI: $z_{0.005} = 2.576$

Margin of error: $m = z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}$

CI for Population Mean ($\sigma$ Unknown) — t-interval

In reality, $\sigma$ is unknown. We estimate it with $s$ and use the t-distribution:

$$\bar{x} \pm t_{\alpha/2, n-1} \cdot \frac{s}{\sqrt{n}}$$

With $n-1$ degrees of freedom.

Example: 16 measurements give $\bar{x} = 48.3$, $s = 6.1$. For a 95% CI:

$t_{0.025, 15} = 2.131$

$48.3 \pm 2.131 \cdot \frac{6.1}{\sqrt{16}} = 48.3 \pm 2.131 \cdot 1.525 = 48.3 \pm 3.25$

95% CI: $[45.05, 51.55]$

CI for a Proportion

For a sample proportion $\hat{p} = k/n$:

$$\hat{p} \pm z_{\alpha/2} \cdot \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}}$$

Validity condition: Need $n\hat{p} \geq 10$ AND $n(1-\hat{p}) \geq 10$ (at least 10 expected successes and failures).

CI for Difference of Means

Independent samples (equal variances assumed):

$$(\bar{x}1 - \bar{x}_2) \pm t{\alpha/2, df} \cdot s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}$$

Where $s_p^2 = \frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1 + n_2 - 2}$ is the pooled variance, and $df = n_1 + n_2 - 2$.

Sample Size Determination

To achieve a desired margin of error $m$ with confidence $1-\alpha$:

For a mean: $n = \left(\frac{z_{\alpha/2} \cdot \sigma}{m}\right)^2$

For a proportion: $n = \hat{p}(1-\hat{p})\left(\frac{z_{\alpha/2}}{m}\right)^2$

When $\hat{p}$ is unknown, use $\hat{p} = 0.5$ (worst case, gives largest $n$).

Key Terms

12 06 Confidence Intervals
95% confidence
CI for Difference of Means
CI for Population Mean ($\sigma$ Known) — z-interval
CI for Population Mean ($\sigma$ Unknown) — t-interval
CI for a Proportion
CI width
Correct: B)
Correct: C)
Example 1: t-interval construction
Example 2: Proportion CI
Example 3: Sample size for desired precision

Worked Examples

Example 1: t-interval construction

A sample of 25 screws has $\bar{x} = 12.4$ mm length and $s = 0.8$ mm. Construct a 99% CI for the true mean length.

$t_{0.005, 24} = 2.797$

$\text{SE} = 0.8 / 5 = 0.16$

$12.4 \pm 2.797 \cdot 0.16 = 12.4 \pm 0.448$

99% CI: $[11.95, 12.85]$ mm

Example 2: Proportion CI

In a poll of 400 voters, 220 support Candidate A. Give a 95% CI for the true proportion.

$\hat{p} = 220/400 = 0.55$

$\text{SE}(\hat{p}) = \sqrt{\frac{0.55 \cdot 0.45}{400}} = \sqrt{\frac{0.2475}{400}} = \sqrt{0.00061875} = 0.02487$

$0.55 \pm 1.96 \cdot 0.02487 = 0.55 \pm 0.0488$

95% CI: $[0.501, 0.599]$ or $[50.1\%, 59.9\%]$

Check conditions: $n\hat{p} = 220 \geq 10$ ✓, $n(1-\hat{p}) = 180 \geq 10$ ✓

Example 3: Sample size for desired precision

You want to estimate a population mean to within $\pm 2$ units with 95% confidence. Previous studies suggest $\sigma \approx 8$. How many observations are needed?

$n = \left(\frac{z_{0.025} \cdot \sigma}{m}\right)^2 = \left(\frac{1.96 \cdot 8}{2}\right)^2 = (7.84)^2 = 61.47$

Round up to $n = 62$ observations.

Quiz

Q1: What does the concept of CI for Difference of Means primarily refer to in this subject?

A) A historical anecdote about CI for Difference of Means B) The definition and application of CI for Difference of Means C) A visual representation of CI for Difference of Means D) A computational error related to CI for Difference of Means

Correct: B)

If you chose A: This is incorrect. CI for Difference of Means is defined as: the definition and application of ci for difference of means. The other options describe different aspects that are not the primary focus.
If you chose B: CI for Difference of Means is defined as: the definition and application of ci for difference of means. The other options describe different aspects that are not the primary focus. Correct!
If you chose C: This is incorrect. CI for Difference of Means is defined as: the definition and application of ci for difference of means. The other options describe different aspects that are not the primary focus.
If you chose D: This is incorrect. CI for Difference of Means is defined as: the definition and application of ci for difference of means. The other options describe different aspects that are not the primary focus.

Q2: Which of the following is the key formula discussed in this subject?

A) An unrelated formula from a different topic B) The inverse operation of the formula in question C) \sigma D) A simplified version of \sigma...

Correct: C)

If you chose A: This is incorrect. The formula \sigma is central to this subject. The other options are either simplified versions or unrelated.
If you chose B: This is incorrect. The formula \sigma is central to this subject. The other options are either simplified versions or unrelated.
If you chose C: The formula \sigma is central to this subject. The other options are either simplified versions or unrelated. Correct!
If you chose D: This is incorrect. The formula \sigma is central to this subject. The other options are either simplified versions or unrelated.

Q3: What is the primary purpose of CI for a Proportion?

A) It is primarily a historical notation system B) It is used only in advanced research contexts C) It is used to ci for a proportion in mathematical analysis D) It replaces all other methods in this domain

Correct: C)

If you chose A: This is incorrect. CI for a Proportion serves the purpose described in the correct answer. The other options misrepresent its role.
If you chose B: This is incorrect. CI for a Proportion serves the purpose described in the correct answer. The other options misrepresent its role.
If you chose C: CI for a Proportion serves the purpose described in the correct answer. The other options misrepresent its role. Correct!
If you chose D: This is incorrect. CI for a Proportion serves the purpose described in the correct answer. The other options misrepresent its role.

Q4: Which statement about CI width is TRUE?

A) CI width is an advanced topic beyond this subject's scope B) CI width is a fundamental concept covered in this subject C) CI width is mentioned only as a historical footnote D) CI width is not related to this subject

Correct: B)

If you chose A: This is incorrect. CI width is a fundamental concept covered in this subject. This subject covers CI width as part of its core content.
If you chose B: CI width is a fundamental concept covered in this subject. This subject covers CI width as part of its core content. Correct!
If you chose C: This is incorrect. CI width is a fundamental concept covered in this subject. This subject covers CI width as part of its core content.
If you chose D: This is incorrect. CI width is a fundamental concept covered in this subject. This subject covers CI width as part of its core content.

Q5: Based on the worked examples in this subject, what is the correct result?

A) $[46.53, 49.87]$ hours B) An unrelated numerical value C) A different result from a common mistake D) The inverse of the correct answer

Correct: A)

If you chose A: The worked examples show that the result is $[46.53, 49.87]$ hours. The other options represent common errors. Correct!
If you chose B: This is incorrect. The worked examples show that the result is $[46.53, 49.87]$ hours. The other options represent common errors.
If you chose C: This is incorrect. The worked examples show that the result is $[46.53, 49.87]$ hours. The other options represent common errors.
If you chose D: This is incorrect. The worked examples show that the result is $[46.53, 49.87]$ hours. The other options represent common errors.

Q6: How are CI width and ⚠️ Critical: What A Confidence Interval Actually Means related?

A) CI width is a special case of ⚠️ Critical: What A Confidence Interval Actually Means B) CI width and ⚠️ Critical: What A Confidence Interval Actually Means are completely unrelated topics C) CI width and ⚠️ Critical: What A Confidence Interval Actually Means are closely related concepts D) CI width is the inverse of ⚠️ Critical: What A Confidence Interval Actually Means

Correct: C)

If you chose A: This is incorrect. Both CI width and ⚠️ Critical: What A Confidence Interval Actually Means are covered in this subject as interconnected topics.
If you chose B: This is incorrect. Both CI width and ⚠️ Critical: What A Confidence Interval Actually Means are covered in this subject as interconnected topics.
If you chose C: Both CI width and ⚠️ Critical: What A Confidence Interval Actually Means are covered in this subject as interconnected topics. Correct!
If you chose D: This is incorrect. Both CI width and ⚠️ Critical: What A Confidence Interval Actually Means are covered in this subject as interconnected topics.

Q7: What is a common pitfall when working with Ci For Population Mean ($\Sigma$ Known) — Z-Interval?

A) A common mistake is confusing Ci For Population Mean ($\Sigma$ Known) — Z-Interval with a similar concept B) Ci For Population Mean ($\Sigma$ Known) — Z-Interval has no common misconceptions C) The main error with Ci For Population Mean ($\Sigma$ Known) — Z-Interval is using it when it is not needed D) Ci For Population Mean ($\Sigma$ Known) — Z-Interval is always computed the same way in all contexts

Correct: A)

If you chose A: Students often confuse Ci For Population Mean ($\Sigma$ Known) — Z-Interval with similar-sounding or related concepts. Pay attention to the precise definitions. Correct!
If you chose B: This is incorrect. Students often confuse Ci For Population Mean ($\Sigma$ Known) — Z-Interval with similar-sounding or related concepts. Pay attention to the precise definitions.
If you chose C: This is incorrect. Students often confuse Ci For Population Mean ($\Sigma$ Known) — Z-Interval with similar-sounding or related concepts. Pay attention to the precise definitions.
If you chose D: This is incorrect. Students often confuse Ci For Population Mean ($\Sigma$ Known) — Z-Interval with similar-sounding or related concepts. Pay attention to the precise definitions.

Q8: When should you apply Ci For Population Mean ($\Sigma$ Unknown) — T-Interval?

A) Ci For Population Mean ($\Sigma$ Unknown) — T-Interval is not practically useful B) Use Ci For Population Mean ($\Sigma$ Unknown) — T-Interval only in pure mathematics contexts C) Avoid Ci For Population Mean ($\Sigma$ Unknown) — T-Interval unless explicitly instructed D) Apply Ci For Population Mean ($\Sigma$ Unknown) — T-Interval to solve problems in this subject's domain

Correct: D)

If you chose A: This is incorrect. Ci For Population Mean ($\Sigma$ Unknown) — T-Interval is a practical tool used throughout this subject to solve relevant problems.
If you chose B: This is incorrect. Ci For Population Mean ($\Sigma$ Unknown) — T-Interval is a practical tool used throughout this subject to solve relevant problems.
If you chose C: This is incorrect. Ci For Population Mean ($\Sigma$ Unknown) — T-Interval is a practical tool used throughout this subject to solve relevant problems.
If you chose D: Ci For Population Mean ($\Sigma$ Unknown) — T-Interval is a practical tool used throughout this subject to solve relevant problems. Correct!

Practice Problems

A sample of 36 batteries has $\bar{x} = 48.2$ hours, $s = 5.1$ hours. Construct a 95% CI for the mean battery life.

Click for answer

$t_{0.025, 35} \approx 2.030$ (use z = 1.96 as approximation for df ≥ 30) $48.2 \pm 1.96 \cdot \frac{5.1}{6} = 48.2 \pm 1.96 \cdot 0.85 = 48.2 \pm 1.666$ 95% CI: $[46.53, 49.87]$ hours

Which gives a wider CI: 90% confidence or 99% confidence? Why?

Click for answer
99% CI is wider. Higher confidence requires capturing the parameter more reliably, so the interval must be broader. The critical value increases: $z_{0.05} = 1.645$ vs $z_{0.005} = 2.576$.
If you quadruple the sample size, what happens to the width of a z-interval?

Click for answer
Width $\propto 1/\sqrt{n}$. Quadrupling $n$ multiplies the width by $1/\sqrt{4} = 1/2$. The CI width is halved.
For a proportion CI, why do we use $\hat{p} = 0.5$ when planning sample size?

Click for answer
The variance of a proportion is $\hat{p}(1-\hat{p})$, which is maximised at $\hat{p} = 0.5$. Using 0.5 gives the largest (most conservative) sample size estimate — guaranteeing sufficient sample regardless of the true proportion.
A 95% CI for $\mu$ is $[12.3, 18.7]$. Does this mean there is a 95% probability that $\mu$ is between 12.3 and 18.7?

Click for answer
**No.** In the frequentist framework, $\mu$ is a fixed constant — it is either in $[12.3, 18.7]$ (probability 1) or it is not (probability 0). The 95% refers to the long-run coverage of the method: if we repeated the experiment many times, 95% of the resulting intervals would contain $\mu$. The correct statement is: "We are 95% confident that $\mu$ lies in $[12.3, 18.7]$."

Summary

Key takeaways:

A confidence interval gives a range of plausible values for a parameter based on the data
95% confidence = the method captures the true parameter in 95% of repeated samples (NOT 95% probability the parameter is in THIS interval)
z-intervals require known $\sigma$; t-intervals use $s$ and are the standard in practice
Wider intervals = more confidence; narrower intervals = more precision
Sample size to halve the margin of error: multiply by 4
For proportions, $\hat{p}=0.5$ gives the most conservative sample size

Pitfalls

Interpreting a 95% CI as "95% probability the parameter is in this interval": This is the Bayesian credible interval interpretation. In the frequentist framework, the parameter is a fixed constant — it is either in the interval or it isn't. The 95% refers to the long-run coverage of the procedure: if you repeated the study many times, 95% of the resulting intervals would capture the true parameter. The correct statement is "we are 95% confident."
Using z-intervals when σ is unknown: In practice, σ is almost never known. Using a z-interval with s plugged in for σ produces intervals that are systematically too narrow and have coverage below the nominal level, especially for small samples. Always use the t-interval (with n−1 df) unless σ is genuinely known from external sources.
Forgetting that CI width scales as 1/√n: To halve the margin of error, you must quadruple the sample size — not double it. Students often think doubling n halves the width, but SE ∝ 1/√n means precision improves slowly. This has real consequences for study design and budget.
Constructing a proportion CI without checking conditions: The normal approximation for proportion CIs requires at least 10 expected successes AND 10 expected failures (n·p̂ ≥ 10 and n·(1−p̂) ≥ 10). When these conditions fail, the actual coverage can be far from the nominal level. Use exact (Clopper-Pearson) or score (Wilson) intervals for small samples.
Confusing "95% confidence" with "95% of data lies in the interval": A confidence interval for a mean describes where the population mean likely lies, not where individual data points fall. A prediction interval — which is always wider — is needed to capture individual observations with specified probability.

Next Steps

Next up: 12-07-hypothesis-testing-basics.md

Progress

Phases

### 12.6 — Confidence Intervals

Learning Objectives

Core Content

⚠️ CRITICAL: What a Confidence Interval Actually Means

CI for Population Mean ($\sigma$ Known) — z-interval

CI for Population Mean ($\sigma$ Unknown) — t-interval

CI for a Proportion

CI for Difference of Means

Sample Size Determination

Key Terms

Worked Examples

Example 1: t-interval construction

Example 2: Proportion CI

Example 3: Sample size for desired precision

Quiz

Practice Problems

Summary

Pitfalls

Next Steps