### 12.3 β Point Estimation
Phase: Statistics Prerequisites: 12-02-sampling-sampling-distributions, 11-01-expectation-continuous-rv
Learning Objectives
By the end of this subject, you will be able to:
- Define an estimator and distinguish it from an estimate
- Compute and interpret bias, variance, and mean squared error of estimators
- Determine whether an estimator is unbiased
- Assess consistency of estimators
- Understand the concept of sufficiency and the Rao-Blackwell theorem (conceptual)
Core Content
Estimator vs Estimate
An estimator is a rule (function) for computing a guess of a parameter from sample data. It is a random variable because it depends on the random sample.
An estimate is the numerical value produced by applying the estimator to a particular observed sample.
| Term | Nature | Notation |
|---|---|---|
| Estimator | Random variable (function of sample) | $\hat{\theta} = g(X_1, \ldots, X_n)$ |
| Estimate | Fixed number (computed from data) | $\hat{\theta} = 3.7$ |
Properties of Estimators
Bias
$$\text{Bias}(\hat{\theta}) = E[\hat{\theta}] - \theta$$
An estimator is unbiased if $E[\hat{\theta}] = \theta$ β it hits the target on average.
Examples: - $\bar{X}$ is unbiased for $\mu$: $E[\bar{X}] = \mu$ - $s^2 = \frac{1}{n-1}\sum(X_i - \bar{X})^2$ is unbiased for $\sigma^2$ - $\hat{\sigma}^2 = \frac{1}{n}\sum(X_i - \bar{X})^2$ is biased: $E[\hat{\sigma}^2] = \frac{n-1}{n}\sigma^2$ β it systematically underestimates
β οΈ CRITICAL: Unbiasedness is about the sampling distribution, not about a single estimate. An unbiased estimator can produce a terrible estimate on any given sample.
Variance
$$\text{Var}(\hat{\theta}) = E[(\hat{\theta} - E[\hat{\theta}])^2]$$
How much the estimator fluctuates from sample to sample. Lower variance = more precise.
Mean Squared Error (MSE)
$$\text{MSE}(\hat{\theta}) = E[(\hat{\theta} - \theta)^2] = \text{Var}(\hat{\theta}) + [\text{Bias}(\hat{\theta})]^2$$
π© Common Pitfall: MSE = Variance + BiasΒ². This is the bias-variance tradeoff: sometimes a slightly biased estimator with much lower variance has lower MSE than an unbiased one. This is the statistical basis for regularisation in machine learning.
Consistency
An estimator $\hat{\theta}_n$ is consistent if it converges in probability to $\theta$ as $n \to \infty$:
$$\lim_{n \to \infty} P(|\hat{\theta}_n - \theta| > \epsilon) = 0 \quad \text{for any } \epsilon > 0$$
Sufficient condition: If $\text{Bias} \to 0$ and $\text{Var} \to 0$ as $n \to \infty$, the estimator is consistent.
$\bar{X}$ is consistent for $\mu$ because $\text{Var}(\bar{X}) = \sigma^2/n \to 0$.
Sufficiency (Conceptual)
A statistic $T(X)$ is sufficient for $\theta$ if it captures ALL information about $\theta$ contained in the sample. Formally:
$$P(X = x \mid T(X) = t, \theta) \text{ does not depend on } \theta$$
Example: For Bernoulli trials, the sum $\sum X_i$ is sufficient for $p$ β the exact sequence of successes and failures adds no information beyond the total count.
Rao-Blackwell Theorem: If $T$ is sufficient and $\hat{\theta}$ is an unbiased estimator, then $E[\hat{\theta} \mid T]$ is an unbiased estimator with variance no larger than $\hat{\theta}$. This gives a method for improving estimators.
Key Terms
- Bias
- Consistent
- Estimator
- Sufficient
Worked Examples
Example 1: Bias of sample variance estimators
Consider $n = 4$ observations from a population with $\sigma^2 = 10$.
Estimator A: $s^2 = \frac{1}{n-1}\sum(X_i - \bar{X})^2$ (with Bessel's correction)
$E[s^2] = \sigma^2 = 10$ β unbiased
Estimator B: $\hat{\sigma}^2 = \frac{1}{n}\sum(X_i - \bar{X})^2$ (without correction)
$E[\hat{\sigma}^2] = \frac{n-1}{n}\sigma^2 = \frac{3}{4} \cdot 10 = 7.5$
Bias = $7.5 - 10 = -2.5$ β systematically underestimates
Yet $\hat{\sigma}^2$ has LOWER variance than $s^2$, and for the normal distribution, its MSE is actually LOWER than $s^2$ for all $n \geq 2$. This is the bias-variance tradeoff in action.
Example 2: Consistency check
Estimator $\hat{\mu}_n = \bar{X}_n + \frac{1}{n}$ (adds a shrinking bias).
Bias: $E[\hat{\mu}_n] - \mu = \frac{1}{n} \to 0$
Variance: $\text{Var}(\hat{\mu}_n) = \frac{\sigma^2}{n} \to 0$
Since both bias and variance go to zero, $\hat{\mu}_n$ is consistent β even though it's biased for any finite $n$.
Example 3: MSE comparison
For estimating a normal mean $\mu$, compare: - $\hat{\theta}_1 = \bar{X}$ (unbiased) - $\hat{\theta}_2 = 0$ (biased, zero variance)
$\text{MSE}(\hat{\theta}_1) = \text{Var}(\bar{X}) + 0^2 = \sigma^2/n$
$\text{MSE}(\hat{\theta}_2) = 0 + (0 - \mu)^2 = \mu^2$
For $|\mu| < \sigma/\sqrt{n}$: the "always guess 0" estimator has lower MSE! This illustrates why MSE isn't always the right metric β it depends on what you care about.
Quiz
Q1: What does the concept of Consistent primarily refer to in this subject?
A) The definition and application of Consistent B) A visual representation of Consistent C) A computational error related to Consistent D) A historical anecdote about Consistent
Correct: A)
- If you chose A: Consistent is defined as: the definition and application of consistent. The other options describe different aspects that are not the primary focus. Correct!
- If you chose B: This is incorrect. Consistent is defined as: the definition and application of consistent. The other options describe different aspects that are not the primary focus.
- If you chose C: This is incorrect. Consistent is defined as: the definition and application of consistent. The other options describe different aspects that are not the primary focus.
- If you chose D: This is incorrect. Consistent is defined as: the definition and application of consistent. The other options describe different aspects that are not the primary focus.
Q2: Which of the following is the key formula discussed in this subject?
A) The inverse operation of the formula in question B) An unrelated formula from a different topic C) \hat{\theta} = g(X_1, \ldots, X_n) D) A simplified version of \hat{\theta} = g(X_1, \ldot...
Correct: C)
- If you chose A: This is incorrect. The formula \hat{\theta} = g(X_1, \ldots, X_n) is central to this subject. The other options are either simplified versions or unrelated.
- If you chose B: This is incorrect. The formula \hat{\theta} = g(X_1, \ldots, X_n) is central to this subject. The other options are either simplified versions or unrelated.
- If you chose C: The formula \hat{\theta} = g(X_1, \ldots, X_n) is central to this subject. The other options are either simplified versions or unrelated. Correct!
- If you chose D: This is incorrect. The formula \hat{\theta} = g(X_1, \ldots, X_n) is central to this subject. The other options are either simplified versions or unrelated.
Q3: What is the primary purpose of Estimator?
A) It is used to estimator in mathematical analysis B) It is used only in advanced research contexts C) It replaces all other methods in this domain D) It is primarily a historical notation system
Correct: A)
- If you chose A: Estimator serves the purpose described in the correct answer. The other options misrepresent its role. Correct!
- If you chose B: This is incorrect. Estimator serves the purpose described in the correct answer. The other options misrepresent its role.
- If you chose C: This is incorrect. Estimator serves the purpose described in the correct answer. The other options misrepresent its role.
- If you chose D: This is incorrect. Estimator serves the purpose described in the correct answer. The other options misrepresent its role.
Q4: Which statement about Sufficient is TRUE?
A) Sufficient is an advanced topic beyond this subject's scope B) Sufficient is not related to this subject C) Sufficient is mentioned only as a historical footnote D) Sufficient is a fundamental concept covered in this subject
Correct: D)
- If you chose A: This is incorrect. Sufficient is a fundamental concept covered in this subject. This subject covers Sufficient as part of its core content.
- If you chose B: This is incorrect. Sufficient is a fundamental concept covered in this subject. This subject covers Sufficient as part of its core content.
- If you chose C: This is incorrect. Sufficient is a fundamental concept covered in this subject. This subject covers Sufficient as part of its core content.
- If you chose D: Sufficient is a fundamental concept covered in this subject. This subject covers Sufficient as part of its core content. Correct!
Q5: Based on the worked examples in this subject, what is the correct result?
A) The inverse of the correct answer B) p$ β C) A different result from a common mistake D) An unrelated numerical value
Correct: B)
- If you chose A: This is incorrect. The worked examples show that the result is p$ β. The other options represent common errors.
- If you chose B: The worked examples show that the result is p$ β. The other options represent common errors. Correct!
- If you chose C: This is incorrect. The worked examples show that the result is p$ β. The other options represent common errors.
- If you chose D: This is incorrect. The worked examples show that the result is p$ β. The other options represent common errors.
Q6: How are Sufficient and Estimator Vs Estimate related?
A) Sufficient and Estimator Vs Estimate are closely related concepts B) Sufficient is the inverse of Estimator Vs Estimate C) Sufficient and Estimator Vs Estimate are completely unrelated topics D) Sufficient is a special case of Estimator Vs Estimate
Correct: A)
- If you chose A: Both Sufficient and Estimator Vs Estimate are covered in this subject as interconnected topics. Correct!
- If you chose B: This is incorrect. Both Sufficient and Estimator Vs Estimate are covered in this subject as interconnected topics.
- If you chose C: This is incorrect. Both Sufficient and Estimator Vs Estimate are covered in this subject as interconnected topics.
- If you chose D: This is incorrect. Both Sufficient and Estimator Vs Estimate are covered in this subject as interconnected topics.
Q7: What is a common pitfall when working with Properties Of Estimators?
A) Properties Of Estimators has no common misconceptions B) The main error with Properties Of Estimators is using it when it is not needed C) Properties Of Estimators is always computed the same way in all contexts D) A common mistake is confusing Properties Of Estimators with a similar concept
Correct: D)
- If you chose A: This is incorrect. Students often confuse Properties Of Estimators with similar-sounding or related concepts. Pay attention to the precise definitions.
- If you chose B: This is incorrect. Students often confuse Properties Of Estimators with similar-sounding or related concepts. Pay attention to the precise definitions.
- If you chose C: This is incorrect. Students often confuse Properties Of Estimators with similar-sounding or related concepts. Pay attention to the precise definitions.
- If you chose D: Students often confuse Properties Of Estimators with similar-sounding or related concepts. Pay attention to the precise definitions. Correct!
Q8: When should you apply Sufficiency (Conceptual)?
A) Avoid Sufficiency (Conceptual) unless explicitly instructed B) Apply Sufficiency (Conceptual) to solve problems in this subject's domain C) Sufficiency (Conceptual) is not practically useful D) Use Sufficiency (Conceptual) only in pure mathematics contexts
Correct: B)
- If you chose A: This is incorrect. Sufficiency (Conceptual) is a practical tool used throughout this subject to solve relevant problems.
- If you chose B: Sufficiency (Conceptual) is a practical tool used throughout this subject to solve relevant problems. Correct!
- If you chose C: This is incorrect. Sufficiency (Conceptual) is a practical tool used throughout this subject to solve relevant problems.
- If you chose D: This is incorrect. Sufficiency (Conceptual) is a practical tool used throughout this subject to solve relevant problems.
Practice Problems
- Show that $\bar{X}$ is an unbiased estimator of $\mu$: prove $E[\bar{X}] = \mu$.
Click for answer
$E[\bar{X}] = E[\frac{1}{n}\sum X_i] = \frac{1}{n}\sum E[X_i] = \frac{1}{n} \cdot n\mu = \mu$ Linearity of expectation is key β this holds regardless of the distribution.-
An estimator has variance 4 and bias 1. What is its MSE?
Click for answer
$\text{MSE} = \text{Var} + \text{Bias}^2 = 4 + 1^2 = 5$ -
For a Bernoulli($p$) sample of size $n$, is $\hat{p} = \frac{\sum X_i}{n}$ unbiased? Is it consistent?
Click for answer
**Unbiased:** $E[\hat{p}] = \frac{1}{n} \cdot n p = p$ β **Consistent:** $\text{Var}(\hat{p}) = \frac{p(1-p)}{n} \to 0$ as $n \to \infty$ β So $\hat{p}$ is both unbiased and consistent. -
Estimator A is unbiased with variance 10. Estimator B has bias 2 and variance 3. Which has lower MSE?
Click for answer
$\text{MSE}_A = 10 + 0^2 = 10$ $\text{MSE}_B = 3 + 2^2 = 7$ Estimator B has lower MSE despite being biased, because its bias-variance tradeoff favours the reduced variance over the small bias. -
Explain why an unbiased estimator can be worse than a biased one.
Click for answer
Unbiasedness only guarantees correct-on-average, not correct-on-any-given-sample. A biased estimator with much lower variance can be closer to the truth more often (lower MSE). For example, ridge regression is biased but often outperforms unbiased OLS by reducing variance. The bias-variance decomposition $\text{MSE} = \text{Var} + \text{Bias}^2$ shows both matter.
Summary
Key takeaways:
- Estimator = rule (random variable); estimate = computed number
- Bias = $E[\hat{\theta}] - \theta$; bias of zero means unbiased
- MSE = $\text{Var} + \text{Bias}^2$ β the bias-variance decomposition
- Consistent estimators converge to the true value as $n$ grows
- Sufficient statistics capture all information about the parameter
- Unbiasedness alone is NOT the only criterion β sometimes a biased estimator with lower variance wins
Pitfalls
- Confusing estimator with estimate: An estimator ΞΈΜ is a random variable (a function of the random sample); an estimate is a specific number computed from observed data. Statements like "the estimator is 3.7" are category errors β 3.7 is an estimate, not the estimator itself.
- Valuing unbiasedness above all else: Unbiasedness means E[ΞΈΜ] = ΞΈ, but this is an average-over-samples property. An unbiased estimator can produce terrible estimates on any given sample. A slightly biased estimator with much lower variance often wins on MSE = Var + BiasΒ². This is the statistical foundation of regularization.
- Forgetting the bias-variance decomposition: MSE is always Var + BiasΒ², not Var + Bias or some other combination. Bias must be squared. This means a small bias can be worthwhile if it reduces variance enough β halving variance with a bias of 1 trades MSE of V for MSE of V/2 + 1.
- Assuming consistency implies unbiasedness: An estimator can be consistent (converges to ΞΈ as n β β) while being biased for every finite n. Example: ΞΈΜβ = XΜβ + 1/n has bias 1/n β 0 and variance ΟΒ²/n β 0, so it's consistent but biased for all finite n.
- Believing sufficient statistics are always unbiased: A statistic T is sufficient if it captures all information about ΞΈ in the sample. Sufficiency is about information content, not about bias. A sufficient statistic can be biased β the Rao-Blackwell theorem uses sufficiency to reduce variance, but bias persists.
Next Steps
Next up: 12-04-mle.md