### 12.6 โ Confidence Intervals
Phase: Statistics Prerequisites: 12-02-sampling-sampling-distributions, 12-03-point-estimation
Learning Objectives
By the end of this subject, you will be able to:
- Interpret confidence intervals correctly (avoiding the most common misinterpretation)
- Construct and compute z-intervals for means ($\sigma$ known)
- Construct and compute t-intervals for means ($\sigma$ unknown)
- Construct confidence intervals for proportions and differences of means
- Determine required sample size for a desired margin of error
Core Content
โ ๏ธ CRITICAL: What a Confidence Interval Actually Means
A 95% confidence interval DOES NOT mean "there is a 95% probability that the true parameter lies in this interval."
The parameter is fixed โ it either is or is not in the interval. The 95% refers to the procedure: if we repeated the sampling process many times, 95% of the resulting intervals would contain the true parameter value.
๐ฉ Common Pitfall: Students (and many practising scientists) wrongly interpret a 95% CI as "95% chance the parameter is in this interval." That's a Bayesian credible interval, not a frequentist confidence interval.
The correct interpretation: "We are 95% confident that the interval [L, U] captures the true parameter" โ where "confident" means the method has 95% coverage in repeated sampling.
CI for Population Mean ($\sigma$ Known) โ z-interval
When $\sigma$ is known (rare in practice but theoretically important):
$$\bar{x} \pm z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}$$
Where $z_{\alpha/2}$ is the critical value from the standard normal: - 90% CI: $z_{0.05} = 1.645$ - 95% CI: $z_{0.025} = 1.96$ - 99% CI: $z_{0.005} = 2.576$
Margin of error: $m = z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}$
CI for Population Mean ($\sigma$ Unknown) โ t-interval
In reality, $\sigma$ is unknown. We estimate it with $s$ and use the t-distribution:
$$\bar{x} \pm t_{\alpha/2, n-1} \cdot \frac{s}{\sqrt{n}}$$
With $n-1$ degrees of freedom.
Example: 16 measurements give $\bar{x} = 48.3$, $s = 6.1$. For a 95% CI:
$t_{0.025, 15} = 2.131$
$48.3 \pm 2.131 \cdot \frac{6.1}{\sqrt{16}} = 48.3 \pm 2.131 \cdot 1.525 = 48.3 \pm 3.25$
95% CI: $[45.05, 51.55]$
CI for a Proportion
For a sample proportion $\hat{p} = k/n$:
$$\hat{p} \pm z_{\alpha/2} \cdot \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}}$$
Validity condition: Need $n\hat{p} \geq 10$ AND $n(1-\hat{p}) \geq 10$ (at least 10 expected successes and failures).
CI for Difference of Means
Independent samples (equal variances assumed):
$$(\bar{x}1 - \bar{x}_2) \pm t{\alpha/2, df} \cdot s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}$$
Where $s_p^2 = \frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1 + n_2 - 2}$ is the pooled variance, and $df = n_1 + n_2 - 2$.
Sample Size Determination
To achieve a desired margin of error $m$ with confidence $1-\alpha$:
For a mean: $n = \left(\frac{z_{\alpha/2} \cdot \sigma}{m}\right)^2$
For a proportion: $n = \hat{p}(1-\hat{p})\left(\frac{z_{\alpha/2}}{m}\right)^2$
When $\hat{p}$ is unknown, use $\hat{p} = 0.5$ (worst case, gives largest $n$).
Key Terms
- 12 06 Confidence Intervals
- 95% confidence
- CI for Difference of Means
- CI for Population Mean ($\sigma$ Known) โ z-interval
- CI for Population Mean ($\sigma$ Unknown) โ t-interval
- CI for a Proportion
- CI width
- Correct: B)
- Correct: C)
- Example 1: t-interval construction
- Example 2: Proportion CI
- Example 3: Sample size for desired precision
Worked Examples
Example 1: t-interval construction
A sample of 25 screws has $\bar{x} = 12.4$ mm length and $s = 0.8$ mm. Construct a 99% CI for the true mean length.
$t_{0.005, 24} = 2.797$
$\text{SE} = 0.8 / 5 = 0.16$
$12.4 \pm 2.797 \cdot 0.16 = 12.4 \pm 0.448$
99% CI: $[11.95, 12.85]$ mm
Example 2: Proportion CI
In a poll of 400 voters, 220 support Candidate A. Give a 95% CI for the true proportion.
$\hat{p} = 220/400 = 0.55$
$\text{SE}(\hat{p}) = \sqrt{\frac{0.55 \cdot 0.45}{400}} = \sqrt{\frac{0.2475}{400}} = \sqrt{0.00061875} = 0.02487$
$0.55 \pm 1.96 \cdot 0.02487 = 0.55 \pm 0.0488$
95% CI: $[0.501, 0.599]$ or $[50.1\%, 59.9\%]$
Check conditions: $n\hat{p} = 220 \geq 10$ โ, $n(1-\hat{p}) = 180 \geq 10$ โ
Example 3: Sample size for desired precision
You want to estimate a population mean to within $\pm 2$ units with 95% confidence. Previous studies suggest $\sigma \approx 8$. How many observations are needed?
$n = \left(\frac{z_{0.025} \cdot \sigma}{m}\right)^2 = \left(\frac{1.96 \cdot 8}{2}\right)^2 = (7.84)^2 = 61.47$
Round up to $n = 62$ observations.
Quiz
Q1: What does the concept of CI for Difference of Means primarily refer to in this subject?
A) A historical anecdote about CI for Difference of Means B) The definition and application of CI for Difference of Means C) A visual representation of CI for Difference of Means D) A computational error related to CI for Difference of Means
Correct: B)
- If you chose A: This is incorrect. CI for Difference of Means is defined as: the definition and application of ci for difference of means. The other options describe different aspects that are not the primary focus.
- If you chose B: CI for Difference of Means is defined as: the definition and application of ci for difference of means. The other options describe different aspects that are not the primary focus. Correct!
- If you chose C: This is incorrect. CI for Difference of Means is defined as: the definition and application of ci for difference of means. The other options describe different aspects that are not the primary focus.
- If you chose D: This is incorrect. CI for Difference of Means is defined as: the definition and application of ci for difference of means. The other options describe different aspects that are not the primary focus.
Q2: Which of the following is the key formula discussed in this subject?
A) An unrelated formula from a different topic B) The inverse operation of the formula in question C) \sigma D) A simplified version of \sigma...
Correct: C)
- If you chose A: This is incorrect. The formula \sigma is central to this subject. The other options are either simplified versions or unrelated.
- If you chose B: This is incorrect. The formula \sigma is central to this subject. The other options are either simplified versions or unrelated.
- If you chose C: The formula \sigma is central to this subject. The other options are either simplified versions or unrelated. Correct!
- If you chose D: This is incorrect. The formula \sigma is central to this subject. The other options are either simplified versions or unrelated.
Q3: What is the primary purpose of CI for a Proportion?
A) It is primarily a historical notation system B) It is used only in advanced research contexts C) It is used to ci for a proportion in mathematical analysis D) It replaces all other methods in this domain
Correct: C)
- If you chose A: This is incorrect. CI for a Proportion serves the purpose described in the correct answer. The other options misrepresent its role.
- If you chose B: This is incorrect. CI for a Proportion serves the purpose described in the correct answer. The other options misrepresent its role.
- If you chose C: CI for a Proportion serves the purpose described in the correct answer. The other options misrepresent its role. Correct!
- If you chose D: This is incorrect. CI for a Proportion serves the purpose described in the correct answer. The other options misrepresent its role.
Q4: Which statement about CI width is TRUE?
A) CI width is an advanced topic beyond this subject's scope B) CI width is a fundamental concept covered in this subject C) CI width is mentioned only as a historical footnote D) CI width is not related to this subject
Correct: B)
- If you chose A: This is incorrect. CI width is a fundamental concept covered in this subject. This subject covers CI width as part of its core content.
- If you chose B: CI width is a fundamental concept covered in this subject. This subject covers CI width as part of its core content. Correct!
- If you chose C: This is incorrect. CI width is a fundamental concept covered in this subject. This subject covers CI width as part of its core content.
- If you chose D: This is incorrect. CI width is a fundamental concept covered in this subject. This subject covers CI width as part of its core content.
Q5: Based on the worked examples in this subject, what is the correct result?
A) $[46.53, 49.87]$ hours B) An unrelated numerical value C) A different result from a common mistake D) The inverse of the correct answer
Correct: A)
- If you chose A: The worked examples show that the result is $[46.53, 49.87]$ hours. The other options represent common errors. Correct!
- If you chose B: This is incorrect. The worked examples show that the result is $[46.53, 49.87]$ hours. The other options represent common errors.
- If you chose C: This is incorrect. The worked examples show that the result is $[46.53, 49.87]$ hours. The other options represent common errors.
- If you chose D: This is incorrect. The worked examples show that the result is $[46.53, 49.87]$ hours. The other options represent common errors.
Q6: How are CI width and โ ๏ธ Critical: What A Confidence Interval Actually Means related?
A) CI width is a special case of โ ๏ธ Critical: What A Confidence Interval Actually Means B) CI width and โ ๏ธ Critical: What A Confidence Interval Actually Means are completely unrelated topics C) CI width and โ ๏ธ Critical: What A Confidence Interval Actually Means are closely related concepts D) CI width is the inverse of โ ๏ธ Critical: What A Confidence Interval Actually Means
Correct: C)
- If you chose A: This is incorrect. Both CI width and โ ๏ธ Critical: What A Confidence Interval Actually Means are covered in this subject as interconnected topics.
- If you chose B: This is incorrect. Both CI width and โ ๏ธ Critical: What A Confidence Interval Actually Means are covered in this subject as interconnected topics.
- If you chose C: Both CI width and โ ๏ธ Critical: What A Confidence Interval Actually Means are covered in this subject as interconnected topics. Correct!
- If you chose D: This is incorrect. Both CI width and โ ๏ธ Critical: What A Confidence Interval Actually Means are covered in this subject as interconnected topics.
Q7: What is a common pitfall when working with Ci For Population Mean ($\Sigma$ Known) โ Z-Interval?
A) A common mistake is confusing Ci For Population Mean ($\Sigma$ Known) โ Z-Interval with a similar concept B) Ci For Population Mean ($\Sigma$ Known) โ Z-Interval has no common misconceptions C) The main error with Ci For Population Mean ($\Sigma$ Known) โ Z-Interval is using it when it is not needed D) Ci For Population Mean ($\Sigma$ Known) โ Z-Interval is always computed the same way in all contexts
Correct: A)
- If you chose A: Students often confuse Ci For Population Mean ($\Sigma$ Known) โ Z-Interval with similar-sounding or related concepts. Pay attention to the precise definitions. Correct!
- If you chose B: This is incorrect. Students often confuse Ci For Population Mean ($\Sigma$ Known) โ Z-Interval with similar-sounding or related concepts. Pay attention to the precise definitions.
- If you chose C: This is incorrect. Students often confuse Ci For Population Mean ($\Sigma$ Known) โ Z-Interval with similar-sounding or related concepts. Pay attention to the precise definitions.
- If you chose D: This is incorrect. Students often confuse Ci For Population Mean ($\Sigma$ Known) โ Z-Interval with similar-sounding or related concepts. Pay attention to the precise definitions.
Q8: When should you apply Ci For Population Mean ($\Sigma$ Unknown) โ T-Interval?
A) Ci For Population Mean ($\Sigma$ Unknown) โ T-Interval is not practically useful B) Use Ci For Population Mean ($\Sigma$ Unknown) โ T-Interval only in pure mathematics contexts C) Avoid Ci For Population Mean ($\Sigma$ Unknown) โ T-Interval unless explicitly instructed D) Apply Ci For Population Mean ($\Sigma$ Unknown) โ T-Interval to solve problems in this subject's domain
Correct: D)
- If you chose A: This is incorrect. Ci For Population Mean ($\Sigma$ Unknown) โ T-Interval is a practical tool used throughout this subject to solve relevant problems.
- If you chose B: This is incorrect. Ci For Population Mean ($\Sigma$ Unknown) โ T-Interval is a practical tool used throughout this subject to solve relevant problems.
- If you chose C: This is incorrect. Ci For Population Mean ($\Sigma$ Unknown) โ T-Interval is a practical tool used throughout this subject to solve relevant problems.
- If you chose D: Ci For Population Mean ($\Sigma$ Unknown) โ T-Interval is a practical tool used throughout this subject to solve relevant problems. Correct!
Practice Problems
- A sample of 36 batteries has $\bar{x} = 48.2$ hours, $s = 5.1$ hours. Construct a 95% CI for the mean battery life.
Click for answer
$t_{0.025, 35} \approx 2.030$ (use z = 1.96 as approximation for df โฅ 30) $48.2 \pm 1.96 \cdot \frac{5.1}{6} = 48.2 \pm 1.96 \cdot 0.85 = 48.2 \pm 1.666$ 95% CI: $[46.53, 49.87]$ hours-
Which gives a wider CI: 90% confidence or 99% confidence? Why?
Click for answer
99% CI is wider. Higher confidence requires capturing the parameter more reliably, so the interval must be broader. The critical value increases: $z_{0.05} = 1.645$ vs $z_{0.005} = 2.576$. -
If you quadruple the sample size, what happens to the width of a z-interval?
Click for answer
Width $\propto 1/\sqrt{n}$. Quadrupling $n$ multiplies the width by $1/\sqrt{4} = 1/2$. The CI width is halved. -
For a proportion CI, why do we use $\hat{p} = 0.5$ when planning sample size?
Click for answer
The variance of a proportion is $\hat{p}(1-\hat{p})$, which is maximised at $\hat{p} = 0.5$. Using 0.5 gives the largest (most conservative) sample size estimate โ guaranteeing sufficient sample regardless of the true proportion. -
A 95% CI for $\mu$ is $[12.3, 18.7]$. Does this mean there is a 95% probability that $\mu$ is between 12.3 and 18.7?
Click for answer
**No.** In the frequentist framework, $\mu$ is a fixed constant โ it is either in $[12.3, 18.7]$ (probability 1) or it is not (probability 0). The 95% refers to the long-run coverage of the method: if we repeated the experiment many times, 95% of the resulting intervals would contain $\mu$. The correct statement is: "We are 95% confident that $\mu$ lies in $[12.3, 18.7]$."
Summary
Key takeaways:
- A confidence interval gives a range of plausible values for a parameter based on the data
- 95% confidence = the method captures the true parameter in 95% of repeated samples (NOT 95% probability the parameter is in THIS interval)
- z-intervals require known $\sigma$; t-intervals use $s$ and are the standard in practice
- Wider intervals = more confidence; narrower intervals = more precision
- Sample size to halve the margin of error: multiply by 4
- For proportions, $\hat{p}=0.5$ gives the most conservative sample size
Pitfalls
- Interpreting a 95% CI as "95% probability the parameter is in this interval": This is the Bayesian credible interval interpretation. In the frequentist framework, the parameter is a fixed constant โ it is either in the interval or it isn't. The 95% refers to the long-run coverage of the procedure: if you repeated the study many times, 95% of the resulting intervals would capture the true parameter. The correct statement is "we are 95% confident."
- Using z-intervals when ฯ is unknown: In practice, ฯ is almost never known. Using a z-interval with s plugged in for ฯ produces intervals that are systematically too narrow and have coverage below the nominal level, especially for small samples. Always use the t-interval (with nโ1 df) unless ฯ is genuinely known from external sources.
- Forgetting that CI width scales as 1/โn: To halve the margin of error, you must quadruple the sample size โ not double it. Students often think doubling n halves the width, but SE โ 1/โn means precision improves slowly. This has real consequences for study design and budget.
- Constructing a proportion CI without checking conditions: The normal approximation for proportion CIs requires at least 10 expected successes AND 10 expected failures (nยทpฬ โฅ 10 and nยท(1โpฬ) โฅ 10). When these conditions fail, the actual coverage can be far from the nominal level. Use exact (Clopper-Pearson) or score (Wilson) intervals for small samples.
- Confusing "95% confidence" with "95% of data lies in the interval": A confidence interval for a mean describes where the population mean likely lies, not where individual data points fall. A prediction interval โ which is always wider โ is needed to capture individual observations with specified probability.
Next Steps
Next up: 12-07-hypothesis-testing-basics.md