📐 Concept diagram

### 12.3 — Point Estimation

Phase: Statistics Prerequisites: 12-02-sampling-sampling-distributions, 11-01-expectation-continuous-rv

Learning Objectives

By the end of this subject, you will be able to:

Define an estimator and distinguish it from an estimate
Compute and interpret bias, variance, and mean squared error of estimators
Determine whether an estimator is unbiased
Assess consistency of estimators
Understand the concept of sufficiency and the Rao-Blackwell theorem (conceptual)

Core Content

Estimator vs Estimate

An estimator is a rule (function) for computing a guess of a parameter from sample data. It is a random variable because it depends on the random sample.

An estimate is the numerical value produced by applying the estimator to a particular observed sample.

Term	Nature	Notation
Estimator	Random variable (function of sample)	$\hat{\theta} = g(X_1, \ldots, X_n)$
Estimate	Fixed number (computed from data)	$\hat{\theta} = 3.7$

Properties of Estimators

Bias

$$\text{Bias}(\hat{\theta}) = E[\hat{\theta}] - \theta$$

An estimator is unbiased if $E[\hat{\theta}] = \theta$ — it hits the target on average.

Examples: - $\bar{X}$ is unbiased for $\mu$: $E[\bar{X}] = \mu$ - $s^2 = \frac{1}{n-1}\sum(X_i - \bar{X})^2$ is unbiased for $\sigma^2$ - $\hat{\sigma}^2 = \frac{1}{n}\sum(X_i - \bar{X})^2$ is biased: $E[\hat{\sigma}^2] = \frac{n-1}{n}\sigma^2$ — it systematically underestimates

⚠️ CRITICAL: Unbiasedness is about the sampling distribution, not about a single estimate. An unbiased estimator can produce a terrible estimate on any given sample.

Variance

$$\text{Var}(\hat{\theta}) = E[(\hat{\theta} - E[\hat{\theta}])^2]$$

How much the estimator fluctuates from sample to sample. Lower variance = more precise.

Mean Squared Error (MSE)

$$\text{MSE}(\hat{\theta}) = E[(\hat{\theta} - \theta)^2] = \text{Var}(\hat{\theta}) + [\text{Bias}(\hat{\theta})]^2$$

🚩 Common Pitfall: MSE = Variance + Bias². This is the bias-variance tradeoff: sometimes a slightly biased estimator with much lower variance has lower MSE than an unbiased one. This is the statistical basis for regularisation in machine learning.

Consistency

An estimator $\hat{\theta}_n$ is consistent if it converges in probability to $\theta$ as $n \to \infty$:

$$\lim_{n \to \infty} P(|\hat{\theta}_n - \theta| > \epsilon) = 0 \quad \text{for any } \epsilon > 0$$

Sufficient condition: If $\text{Bias} \to 0$ and $\text{Var} \to 0$ as $n \to \infty$, the estimator is consistent.

$\bar{X}$ is consistent for $\mu$ because $\text{Var}(\bar{X}) = \sigma^2/n \to 0$.

Sufficiency (Conceptual)

A statistic $T(X)$ is sufficient for $\theta$ if it captures ALL information about $\theta$ contained in the sample. Formally:

$$P(X = x \mid T(X) = t, \theta) \text{ does not depend on } \theta$$

Example: For Bernoulli trials, the sum $\sum X_i$ is sufficient for $p$ — the exact sequence of successes and failures adds no information beyond the total count.

Rao-Blackwell Theorem: If $T$ is sufficient and $\hat{\theta}$ is an unbiased estimator, then $E[\hat{\theta} \mid T]$ is an unbiased estimator with variance no larger than $\hat{\theta}$. This gives a method for improving estimators.

Key Terms

Bias
Consistent
Estimator
Sufficient

Worked Examples

Example 1: Bias of sample variance estimators

Consider $n = 4$ observations from a population with $\sigma^2 = 10$.

Estimator A: $s^2 = \frac{1}{n-1}\sum(X_i - \bar{X})^2$ (with Bessel's correction)

$E[s^2] = \sigma^2 = 10$ → unbiased

Estimator B: $\hat{\sigma}^2 = \frac{1}{n}\sum(X_i - \bar{X})^2$ (without correction)

$E[\hat{\sigma}^2] = \frac{n-1}{n}\sigma^2 = \frac{3}{4} \cdot 10 = 7.5$

Bias = $7.5 - 10 = -2.5$ → systematically underestimates

Yet $\hat{\sigma}^2$ has LOWER variance than $s^2$, and for the normal distribution, its MSE is actually LOWER than $s^2$ for all $n \geq 2$. This is the bias-variance tradeoff in action.

Example 2: Consistency check

Estimator $\hat{\mu}_n = \bar{X}_n + \frac{1}{n}$ (adds a shrinking bias).

Bias: $E[\hat{\mu}_n] - \mu = \frac{1}{n} \to 0$

Variance: $\text{Var}(\hat{\mu}_n) = \frac{\sigma^2}{n} \to 0$

Since both bias and variance go to zero, $\hat{\mu}_n$ is consistent — even though it's biased for any finite $n$.

Example 3: MSE comparison

For estimating a normal mean $\mu$, compare: - $\hat{\theta}_1 = \bar{X}$ (unbiased) - $\hat{\theta}_2 = 0$ (biased, zero variance)

$\text{MSE}(\hat{\theta}_1) = \text{Var}(\bar{X}) + 0^2 = \sigma^2/n$

$\text{MSE}(\hat{\theta}_2) = 0 + (0 - \mu)^2 = \mu^2$

For $|\mu| < \sigma/\sqrt{n}$: the "always guess 0" estimator has lower MSE! This illustrates why MSE isn't always the right metric — it depends on what you care about.

Quiz

Q1: What does the concept of Consistent primarily refer to in this subject?

A) The definition and application of Consistent B) A visual representation of Consistent C) A computational error related to Consistent D) A historical anecdote about Consistent

Correct: A)

If you chose A: Consistent is defined as: the definition and application of consistent. The other options describe different aspects that are not the primary focus. Correct!
If you chose B: This is incorrect. Consistent is defined as: the definition and application of consistent. The other options describe different aspects that are not the primary focus.
If you chose C: This is incorrect. Consistent is defined as: the definition and application of consistent. The other options describe different aspects that are not the primary focus.
If you chose D: This is incorrect. Consistent is defined as: the definition and application of consistent. The other options describe different aspects that are not the primary focus.

Q2: Which of the following is the key formula discussed in this subject?

A) The inverse operation of the formula in question B) An unrelated formula from a different topic C) \hat{\theta} = g(X_1, \ldots, X_n) D) A simplified version of \hat{\theta} = g(X_1, \ldot...

Correct: C)

If you chose A: This is incorrect. The formula \hat{\theta} = g(X_1, \ldots, X_n) is central to this subject. The other options are either simplified versions or unrelated.
If you chose B: This is incorrect. The formula \hat{\theta} = g(X_1, \ldots, X_n) is central to this subject. The other options are either simplified versions or unrelated.
If you chose C: The formula \hat{\theta} = g(X_1, \ldots, X_n) is central to this subject. The other options are either simplified versions or unrelated. Correct!
If you chose D: This is incorrect. The formula \hat{\theta} = g(X_1, \ldots, X_n) is central to this subject. The other options are either simplified versions or unrelated.

Q3: What is the primary purpose of Estimator?

A) It is used to estimator in mathematical analysis B) It is used only in advanced research contexts C) It replaces all other methods in this domain D) It is primarily a historical notation system

Correct: A)

If you chose A: Estimator serves the purpose described in the correct answer. The other options misrepresent its role. Correct!
If you chose B: This is incorrect. Estimator serves the purpose described in the correct answer. The other options misrepresent its role.
If you chose C: This is incorrect. Estimator serves the purpose described in the correct answer. The other options misrepresent its role.
If you chose D: This is incorrect. Estimator serves the purpose described in the correct answer. The other options misrepresent its role.

Q4: Which statement about Sufficient is TRUE?

A) Sufficient is an advanced topic beyond this subject's scope B) Sufficient is not related to this subject C) Sufficient is mentioned only as a historical footnote D) Sufficient is a fundamental concept covered in this subject

Correct: D)

If you chose A: This is incorrect. Sufficient is a fundamental concept covered in this subject. This subject covers Sufficient as part of its core content.
If you chose B: This is incorrect. Sufficient is a fundamental concept covered in this subject. This subject covers Sufficient as part of its core content.
If you chose C: This is incorrect. Sufficient is a fundamental concept covered in this subject. This subject covers Sufficient as part of its core content.
If you chose D: Sufficient is a fundamental concept covered in this subject. This subject covers Sufficient as part of its core content. Correct!

Q5: Based on the worked examples in this subject, what is the correct result?

A) The inverse of the correct answer B) p$ ✓ C) A different result from a common mistake D) An unrelated numerical value

Correct: B)

If you chose A: This is incorrect. The worked examples show that the result is p$ ✓. The other options represent common errors.
If you chose B: The worked examples show that the result is p$ ✓. The other options represent common errors. Correct!
If you chose C: This is incorrect. The worked examples show that the result is p$ ✓. The other options represent common errors.
If you chose D: This is incorrect. The worked examples show that the result is p$ ✓. The other options represent common errors.

Q6: How are Sufficient and Estimator Vs Estimate related?

A) Sufficient and Estimator Vs Estimate are closely related concepts B) Sufficient is the inverse of Estimator Vs Estimate C) Sufficient and Estimator Vs Estimate are completely unrelated topics D) Sufficient is a special case of Estimator Vs Estimate

Correct: A)

If you chose A: Both Sufficient and Estimator Vs Estimate are covered in this subject as interconnected topics. Correct!
If you chose B: This is incorrect. Both Sufficient and Estimator Vs Estimate are covered in this subject as interconnected topics.
If you chose C: This is incorrect. Both Sufficient and Estimator Vs Estimate are covered in this subject as interconnected topics.
If you chose D: This is incorrect. Both Sufficient and Estimator Vs Estimate are covered in this subject as interconnected topics.

Q7: What is a common pitfall when working with Properties Of Estimators?

A) Properties Of Estimators has no common misconceptions B) The main error with Properties Of Estimators is using it when it is not needed C) Properties Of Estimators is always computed the same way in all contexts D) A common mistake is confusing Properties Of Estimators with a similar concept

Correct: D)

If you chose A: This is incorrect. Students often confuse Properties Of Estimators with similar-sounding or related concepts. Pay attention to the precise definitions.
If you chose B: This is incorrect. Students often confuse Properties Of Estimators with similar-sounding or related concepts. Pay attention to the precise definitions.
If you chose C: This is incorrect. Students often confuse Properties Of Estimators with similar-sounding or related concepts. Pay attention to the precise definitions.
If you chose D: Students often confuse Properties Of Estimators with similar-sounding or related concepts. Pay attention to the precise definitions. Correct!

Q8: When should you apply Sufficiency (Conceptual)?

A) Avoid Sufficiency (Conceptual) unless explicitly instructed B) Apply Sufficiency (Conceptual) to solve problems in this subject's domain C) Sufficiency (Conceptual) is not practically useful D) Use Sufficiency (Conceptual) only in pure mathematics contexts

Correct: B)

If you chose A: This is incorrect. Sufficiency (Conceptual) is a practical tool used throughout this subject to solve relevant problems.
If you chose B: Sufficiency (Conceptual) is a practical tool used throughout this subject to solve relevant problems. Correct!
If you chose C: This is incorrect. Sufficiency (Conceptual) is a practical tool used throughout this subject to solve relevant problems.
If you chose D: This is incorrect. Sufficiency (Conceptual) is a practical tool used throughout this subject to solve relevant problems.

Practice Problems

Show that $\bar{X}$ is an unbiased estimator of $\mu$: prove $E[\bar{X}] = \mu$.

Click for answer

$E[\bar{X}] = E[\frac{1}{n}\sum X_i] = \frac{1}{n}\sum E[X_i] = \frac{1}{n} \cdot n\mu = \mu$ Linearity of expectation is key — this holds regardless of the distribution.

An estimator has variance 4 and bias 1. What is its MSE?

Click for answer
$\text{MSE} = \text{Var} + \text{Bias}^2 = 4 + 1^2 = 5$
For a Bernoulli($p$) sample of size $n$, is $\hat{p} = \frac{\sum X_i}{n}$ unbiased? Is it consistent?

Click for answer
**Unbiased:** $E[\hat{p}] = \frac{1}{n} \cdot n p = p$ ✓ **Consistent:** $\text{Var}(\hat{p}) = \frac{p(1-p)}{n} \to 0$ as $n \to \infty$ ✓ So $\hat{p}$ is both unbiased and consistent.
Estimator A is unbiased with variance 10. Estimator B has bias 2 and variance 3. Which has lower MSE?

Click for answer
$\text{MSE}_A = 10 + 0^2 = 10$ $\text{MSE}_B = 3 + 2^2 = 7$ Estimator B has lower MSE despite being biased, because its bias-variance tradeoff favours the reduced variance over the small bias.
Explain why an unbiased estimator can be worse than a biased one.

Click for answer
Unbiasedness only guarantees correct-on-average, not correct-on-any-given-sample. A biased estimator with much lower variance can be closer to the truth more often (lower MSE). For example, ridge regression is biased but often outperforms unbiased OLS by reducing variance. The bias-variance decomposition $\text{MSE} = \text{Var} + \text{Bias}^2$ shows both matter.

Summary

Key takeaways:

Estimator = rule (random variable); estimate = computed number
Bias = $E[\hat{\theta}] - \theta$; bias of zero means unbiased
MSE = $\text{Var} + \text{Bias}^2$ — the bias-variance decomposition
Consistent estimators converge to the true value as $n$ grows
Sufficient statistics capture all information about the parameter
Unbiasedness alone is NOT the only criterion — sometimes a biased estimator with lower variance wins

Pitfalls

Confusing estimator with estimate: An estimator θ̂ is a random variable (a function of the random sample); an estimate is a specific number computed from observed data. Statements like "the estimator is 3.7" are category errors — 3.7 is an estimate, not the estimator itself.
Valuing unbiasedness above all else: Unbiasedness means E[θ̂] = θ, but this is an average-over-samples property. An unbiased estimator can produce terrible estimates on any given sample. A slightly biased estimator with much lower variance often wins on MSE = Var + Bias². This is the statistical foundation of regularization.
Forgetting the bias-variance decomposition: MSE is always Var + Bias², not Var + Bias or some other combination. Bias must be squared. This means a small bias can be worthwhile if it reduces variance enough — halving variance with a bias of 1 trades MSE of V for MSE of V/2 + 1.
Assuming consistency implies unbiasedness: An estimator can be consistent (converges to θ as n → ∞) while being biased for every finite n. Example: θ̂ₙ = X̄ₙ + 1/n has bias 1/n → 0 and variance σ²/n → 0, so it's consistent but biased for all finite n.
Believing sufficient statistics are always unbiased: A statistic T is sufficient if it captures all information about θ in the sample. Sufficiency is about information content, not about bias. A sufficient statistic can be biased — the Rao-Blackwell theorem uses sufficiency to reduce variance, but bias persists.

Next Steps

Next up: 12-04-mle.md

Progress

Phases

### 12.3 — Point Estimation

Learning Objectives

Core Content

Estimator vs Estimate

Properties of Estimators

Bias

Variance

Mean Squared Error (MSE)

Consistency

Sufficiency (Conceptual)

Key Terms

Worked Examples

Example 1: Bias of sample variance estimators

Example 2: Consistency check

Example 3: MSE comparison

Quiz

Practice Problems

Summary

Pitfalls

Next Steps