Math graphic
πŸ“ Concept diagram

### 12.3 β€” Point Estimation

Phase: Statistics Prerequisites: 12-02-sampling-sampling-distributions, 11-01-expectation-continuous-rv

Learning Objectives

By the end of this subject, you will be able to:

  1. Define an estimator and distinguish it from an estimate
  2. Compute and interpret bias, variance, and mean squared error of estimators
  3. Determine whether an estimator is unbiased
  4. Assess consistency of estimators
  5. Understand the concept of sufficiency and the Rao-Blackwell theorem (conceptual)

Core Content

Estimator vs Estimate

An estimator is a rule (function) for computing a guess of a parameter from sample data. It is a random variable because it depends on the random sample.

An estimate is the numerical value produced by applying the estimator to a particular observed sample.

Term Nature Notation
Estimator Random variable (function of sample) $\hat{\theta} = g(X_1, \ldots, X_n)$
Estimate Fixed number (computed from data) $\hat{\theta} = 3.7$

Properties of Estimators

Bias

$$\text{Bias}(\hat{\theta}) = E[\hat{\theta}] - \theta$$

An estimator is unbiased if $E[\hat{\theta}] = \theta$ β€” it hits the target on average.

Examples: - $\bar{X}$ is unbiased for $\mu$: $E[\bar{X}] = \mu$ - $s^2 = \frac{1}{n-1}\sum(X_i - \bar{X})^2$ is unbiased for $\sigma^2$ - $\hat{\sigma}^2 = \frac{1}{n}\sum(X_i - \bar{X})^2$ is biased: $E[\hat{\sigma}^2] = \frac{n-1}{n}\sigma^2$ β€” it systematically underestimates

⚠️ CRITICAL: Unbiasedness is about the sampling distribution, not about a single estimate. An unbiased estimator can produce a terrible estimate on any given sample.

Variance

$$\text{Var}(\hat{\theta}) = E[(\hat{\theta} - E[\hat{\theta}])^2]$$

How much the estimator fluctuates from sample to sample. Lower variance = more precise.

Mean Squared Error (MSE)

$$\text{MSE}(\hat{\theta}) = E[(\hat{\theta} - \theta)^2] = \text{Var}(\hat{\theta}) + [\text{Bias}(\hat{\theta})]^2$$

🚩 Common Pitfall: MSE = Variance + Bias². This is the bias-variance tradeoff: sometimes a slightly biased estimator with much lower variance has lower MSE than an unbiased one. This is the statistical basis for regularisation in machine learning.

Consistency

An estimator $\hat{\theta}_n$ is consistent if it converges in probability to $\theta$ as $n \to \infty$:

$$\lim_{n \to \infty} P(|\hat{\theta}_n - \theta| > \epsilon) = 0 \quad \text{for any } \epsilon > 0$$

Sufficient condition: If $\text{Bias} \to 0$ and $\text{Var} \to 0$ as $n \to \infty$, the estimator is consistent.

$\bar{X}$ is consistent for $\mu$ because $\text{Var}(\bar{X}) = \sigma^2/n \to 0$.

Sufficiency (Conceptual)

A statistic $T(X)$ is sufficient for $\theta$ if it captures ALL information about $\theta$ contained in the sample. Formally:

$$P(X = x \mid T(X) = t, \theta) \text{ does not depend on } \theta$$

Example: For Bernoulli trials, the sum $\sum X_i$ is sufficient for $p$ β€” the exact sequence of successes and failures adds no information beyond the total count.

Rao-Blackwell Theorem: If $T$ is sufficient and $\hat{\theta}$ is an unbiased estimator, then $E[\hat{\theta} \mid T]$ is an unbiased estimator with variance no larger than $\hat{\theta}$. This gives a method for improving estimators.



Key Terms

Worked Examples

Example 1: Bias of sample variance estimators

Consider $n = 4$ observations from a population with $\sigma^2 = 10$.

Estimator A: $s^2 = \frac{1}{n-1}\sum(X_i - \bar{X})^2$ (with Bessel's correction)

$E[s^2] = \sigma^2 = 10$ β†’ unbiased

Estimator B: $\hat{\sigma}^2 = \frac{1}{n}\sum(X_i - \bar{X})^2$ (without correction)

$E[\hat{\sigma}^2] = \frac{n-1}{n}\sigma^2 = \frac{3}{4} \cdot 10 = 7.5$

Bias = $7.5 - 10 = -2.5$ β†’ systematically underestimates

Yet $\hat{\sigma}^2$ has LOWER variance than $s^2$, and for the normal distribution, its MSE is actually LOWER than $s^2$ for all $n \geq 2$. This is the bias-variance tradeoff in action.

Example 2: Consistency check

Estimator $\hat{\mu}_n = \bar{X}_n + \frac{1}{n}$ (adds a shrinking bias).

Bias: $E[\hat{\mu}_n] - \mu = \frac{1}{n} \to 0$

Variance: $\text{Var}(\hat{\mu}_n) = \frac{\sigma^2}{n} \to 0$

Since both bias and variance go to zero, $\hat{\mu}_n$ is consistent β€” even though it's biased for any finite $n$.

Example 3: MSE comparison

For estimating a normal mean $\mu$, compare: - $\hat{\theta}_1 = \bar{X}$ (unbiased) - $\hat{\theta}_2 = 0$ (biased, zero variance)

$\text{MSE}(\hat{\theta}_1) = \text{Var}(\bar{X}) + 0^2 = \sigma^2/n$

$\text{MSE}(\hat{\theta}_2) = 0 + (0 - \mu)^2 = \mu^2$

For $|\mu| < \sigma/\sqrt{n}$: the "always guess 0" estimator has lower MSE! This illustrates why MSE isn't always the right metric β€” it depends on what you care about.



Quiz

Q1: What does the concept of Consistent primarily refer to in this subject?

A) The definition and application of Consistent B) A visual representation of Consistent C) A computational error related to Consistent D) A historical anecdote about Consistent

Correct: A)

Q2: Which of the following is the key formula discussed in this subject?

A) The inverse operation of the formula in question B) An unrelated formula from a different topic C) \hat{\theta} = g(X_1, \ldots, X_n) D) A simplified version of \hat{\theta} = g(X_1, \ldot...

Correct: C)

Q3: What is the primary purpose of Estimator?

A) It is used to estimator in mathematical analysis B) It is used only in advanced research contexts C) It replaces all other methods in this domain D) It is primarily a historical notation system

Correct: A)

Q4: Which statement about Sufficient is TRUE?

A) Sufficient is an advanced topic beyond this subject's scope B) Sufficient is not related to this subject C) Sufficient is mentioned only as a historical footnote D) Sufficient is a fundamental concept covered in this subject

Correct: D)

Q5: Based on the worked examples in this subject, what is the correct result?

A) The inverse of the correct answer B) p$ βœ“ C) A different result from a common mistake D) An unrelated numerical value

Correct: B)

Q6: How are Sufficient and Estimator Vs Estimate related?

A) Sufficient and Estimator Vs Estimate are closely related concepts B) Sufficient is the inverse of Estimator Vs Estimate C) Sufficient and Estimator Vs Estimate are completely unrelated topics D) Sufficient is a special case of Estimator Vs Estimate

Correct: A)

Q7: What is a common pitfall when working with Properties Of Estimators?

A) Properties Of Estimators has no common misconceptions B) The main error with Properties Of Estimators is using it when it is not needed C) Properties Of Estimators is always computed the same way in all contexts D) A common mistake is confusing Properties Of Estimators with a similar concept

Correct: D)

Q8: When should you apply Sufficiency (Conceptual)?

A) Avoid Sufficiency (Conceptual) unless explicitly instructed B) Apply Sufficiency (Conceptual) to solve problems in this subject's domain C) Sufficiency (Conceptual) is not practically useful D) Use Sufficiency (Conceptual) only in pure mathematics contexts

Correct: B)

Practice Problems

  1. Show that $\bar{X}$ is an unbiased estimator of $\mu$: prove $E[\bar{X}] = \mu$.
Click for answer $E[\bar{X}] = E[\frac{1}{n}\sum X_i] = \frac{1}{n}\sum E[X_i] = \frac{1}{n} \cdot n\mu = \mu$ Linearity of expectation is key β€” this holds regardless of the distribution.
  1. An estimator has variance 4 and bias 1. What is its MSE?

    Click for answer $\text{MSE} = \text{Var} + \text{Bias}^2 = 4 + 1^2 = 5$

  2. For a Bernoulli($p$) sample of size $n$, is $\hat{p} = \frac{\sum X_i}{n}$ unbiased? Is it consistent?

    Click for answer **Unbiased:** $E[\hat{p}] = \frac{1}{n} \cdot n p = p$ βœ“ **Consistent:** $\text{Var}(\hat{p}) = \frac{p(1-p)}{n} \to 0$ as $n \to \infty$ βœ“ So $\hat{p}$ is both unbiased and consistent.

  3. Estimator A is unbiased with variance 10. Estimator B has bias 2 and variance 3. Which has lower MSE?

    Click for answer $\text{MSE}_A = 10 + 0^2 = 10$ $\text{MSE}_B = 3 + 2^2 = 7$ Estimator B has lower MSE despite being biased, because its bias-variance tradeoff favours the reduced variance over the small bias.

  4. Explain why an unbiased estimator can be worse than a biased one.

    Click for answer Unbiasedness only guarantees correct-on-average, not correct-on-any-given-sample. A biased estimator with much lower variance can be closer to the truth more often (lower MSE). For example, ridge regression is biased but often outperforms unbiased OLS by reducing variance. The bias-variance decomposition $\text{MSE} = \text{Var} + \text{Bias}^2$ shows both matter.


Summary

Key takeaways:


Pitfalls



Next Steps

Next up: 12-04-mle.md