Author: Abdallah Chehade

Editor: Abdallah Chehade

# Unbiased Estimators

In this article we will limit our-self to the class of estimators with the same bias for $\theta$.

\begin{equation} C_\tau = {W: E_\theta[W]=\tau(\theta)} \end{equation}

In this article we will first define an unbiased estimator. Then, we will go through different techniques to find the best unbiased estimator if it exists.

In many critical applications it is required to find unbiased estimators. For such applications it is preferred to find the best unbiased estimator.

# 1. Best Unbiased Estimator Definition

An estimator $W^*$ is best unbiased estimator for for $\tau(\theta)$ if it satisfies $E_\theta[W^*]=\tau(\theta)$ for all $\theta$ and for any other estimator $W$ satisfying $E_\theta[W]=\tau(\theta)$ for all $\theta$.

Then, we have $Var_\theta(W^*) \leq Var_\theta(W)$ for all $\theta$.

# 2. Cramer-Rao Inequality (Technique 1)

This inequality provides a lower bound on the variance of an unbiased estimator for $\tau(\theta)$. Thus, if there exist an unbiased estimator $W$ for $\tau(\theta)$ that achieves the Cramer-Rao lower bound. Then, this estimator $W$ is the Best Unbiased Estimator for $\tau(\theta)$.

Let $X_1$, …, $X_n$ be a random sample with pdf $f(\vec{x}|\theta)$ and let $W(\vec{x})$ be an estimator satisfying:

- $\frac{\partial}{\partial\theta} E_\theta[W(\vec{x})] = \int{}^{}\frac{\partial}{\partial\theta}[W(\vec{x})f(\vec{x}|\theta)]d\vec{x}$
- $Var_\theta(W(\vec{x})) < \infty$

Then, $Var_\theta(W(\vec{x})) \geq \frac{(\frac{\partial}{\partial\theta} E_\theta[W(\vec{x})])^2}{E_\theta[(\frac{\partial}{\partial\theta}log(f(\vec{x}|\theta)))^2]}$

Where, $E_\theta[(\frac{\partial}{\partial\theta}log(f(\vec{x}|\theta)))^2] = Var_\theta[\frac{\partial}{\partial\theta}log(f(\vec{x}|\theta)]$ is the fisher information in sample $\vec{x}$

# 3. Lehmann-Scheffe (Technique 2)

The Cramer-Rao Inequality technique have some drawbacks:

- Can not be used if at least one of the conditions is not satisfied.
- The best unbiased estimator might not achieve the Cramer-Rao lower bound even if the conditions are satisfied.

Let $T(X)$ be a complete sufficient statistic for $\theta$ or $\tau(\theta)$. Suppose $\phi(T)$ is an estimator such that $E_\theta[\phi(T(\vec{x}))]=\tau(\theta)$.

Then, $\phi(T)$ is the best unbiased estimator of $\tau(\theta)$.

# 4. Rao-Blackwellization (Technique 3)

In some cases it is hard to find a function of $\phi(T)$ with $T(X)$ complete sufficient statistic such that $E_\theta[\phi(T(\vec{x}))]=\tau(\theta)$. However, if it is possible to find an unbiased estimator $W$ for $\tau(\theta)$.

Then, $g(T)=E_\theta[W|T]$ is the best unbiased estimator of $\tau(\theta)$ using the Rao-Blackwellize theorem.

# 5. Uniqueness

If a best unbiased estimator exists then it is unique.

# 6. Remarks

The below techniques could be used to find an unbiased estimator:

- Bayesian Estimators/Rules: It might be the case that the estimator is not directly unbiased of $\theta$. But, a function of this estimator could be unbiased for $\tau(\theta)$. Then, apply one of the above techniques to find the best unbiased estimator of $\tau(\theta)$.
- Maximum-Likelihood Estimators (MLE): Similar to Bayesian Estimators.
- Method of Moments Estimators: Similar to the Bayesian Estimators and MLEs.

In many critical applications it is required to find unbiased estimators. For such applications it is preferred to find the best unbiased estimator.

# 7. Generalization

In all of the above techniques the best unbiased estimator was defined as the estimator that minimizes the variance and accordingly the Mean Square Error (MSE). However, this can be extended to more generic classes of loss functions other than the Mean Square Error.

Thanks for sharing! I wonder why the Rao-Blackwellization still holds when generalized to other loss function. Do they have the same proof?

Thank you for your question. Indeed Rao-Blackwellization still holds for convex loss functions $L(\theta,\delta(\vec{x})) $ other than the squared loss function $E_\theta[(\delta(\vec{x})-\theta)^2]$.

Define the Risk to be $R(\theta,\delta(\vec{x})) = E_\theta[L(\theta,\delta(\vec{x}))]$.

Now we shall proof that if $g(T)=E_\theta[\delta(\vec{x})|T]$ (Rao-Blackwellized Estimator) where $T(\vec{X})$ is a sufficient statistic then $R(\theta,g(T(\vec{x}))) \leq R(\theta,\delta(\vec{x}))$ for all $\theta$.

In other words $E_\theta[L(\theta,g(T(\vec{x})))] \leq E_\theta[L(\theta,\delta(\vec{x}))]$ for all $\theta$.

Proof:

Let $g(T)=E_\theta[\delta(\vec{x})|T]$

By conditional Jensen Inequality that is applicable only for convex functions

$L(E[\delta(\vec{x})|T],\theta) \leq E[L(\delta(\vec{x}),\theta)|T]$

Then $R(\theta,g(T(\vec{x}))) \leq E_\theta[E[L(\delta(\vec{x}),\theta)|T]]$

But $T(\vec{X})$ is a sufficient statistic and thus independent of $\theta$

Which means $E_\theta[E[L(\delta(\vec{x}),\theta)|T]] = E_\theta[L(\theta,\delta(\vec{x}))] = R(\theta,\delta(\vec{x}))$

Then $R(\theta,g(T(\vec{x}))) \leq R(\theta,\delta(\vec{x}))$

This proof extend Rao-Blackwellization over convex loss functions which is mostly the case. Note that the Mean Squared Error (MSE) is a special Risk under the squared loss function $(\delta(\vec{x}) – \theta)^2$.

This is very helpful!