Thursday, 13 November 2025

Maximum Likelihood Estimation

by W. B. Meitei, PhD


Maximum likelihood estimation is a fundamental statistical method used to estimate the unknown parameters of a probability distribution based on observed data. Given a parametric family or probability density (or mass) functions \(f(x;\theta)\) with a parameter Ө ϵ Θ, the likelihood function for i.i.d. observations X1, X2, …, Xn is defined as,

\[L\left( \theta;X_{1},X_{2},\ldots,X_{n} \right) = \prod_{i = 1}^{n}{f(X_{i};\theta)}\]

Or equivalently, the log likelihood function is,

\[l(\theta) = \log\left\{ L(\theta) \right\} = \sum_{i = 1}^{n}{\log\left\{ f(X_{i};\theta) \right\}}\]

MLE chooses the value \({\widehat{\theta}}_{MLE}\) that maximises the likelihood or the log likelihood function, i.e.,

\[{\widehat{\theta}}_{MLE} = arg\max_{Ө\ \epsilon\ \Theta}{L\left( \theta;X_{1},X_{2},\ldots,X_{n} \right)} = arg\max_{Ө\ \epsilon\ \Theta}{l(\theta)}\]


Maximum likelihood estimation theorems

1. Consistency of MLE

Let \({\widehat{\theta}}_{n}\) denote the MLE based on n independent observations from a distribution with true parameter \(\theta_{0}\ \epsilon\ \Theta\). Under standard regularity conditions (smoothness, identifiability, and existence of moments),

\[\lim_{n \rightarrow \infty}{P\left\lbrack |{\widehat{\theta}}_{n} - \theta_{0}| \leq \varepsilon \right\rbrack = 1}\]

i.e., MLE is consistent as the sample size increases.

2. Asymptotic normality

Under the same regularity conditions, the MLE is asymptotically normal,

\[\sqrt{n}\left( {\widehat{\theta}}_{n} - \theta_{0} \right)\overset{asy}{\rightarrow}N\left\lbrack 0,I^{- 1}\left( \theta_{0} \right) \right\rbrack\]

Where, \(I\left( \theta_{0} \right)\) is the Fisher Information,

\[I\left( \theta_{0} \right) = {Var}_{\theta_{0}}\left\lbrack \frac{\partial}{\partial\theta}\log\left\{ f(X_{i};\theta_{0}) \right\} \right\rbrack = - E\left\lbrack \frac{\partial^{2}}{\partial\theta^{2}}\log\left\{ f(X_{i};\theta_{0}) \right\} \right\rbrack\]

This theorem underpins the construction of asymptotic confidence intervals and hypothesis test for MLEs.

3. Efficiency

Asymptotically, the MLE achieves the Cramer-Rao lower bound, meaning it is asymptotically efficient, i.e., the variance of \({\widehat{\theta}}_{n}\) approaches the minimum possible variance for unbiased estimators as n.

\[Var\left( {\widehat{\theta}}_{n} \right) \approx I^{- 1}\left( \theta_{0} \right)\]


Regularity conditions

a. True parameter interior condition

The true parameter \(\theta_{0}\) lies in the interior of the parameter space Θ, i.e. \(\theta_{0}\ \epsilon\ int(\Theta)\).

b. Identifiability

Different parameter values correspond to different probability distributions, i.e.,

\[f\left( x;\theta_{1} \right) = f\left( x;\theta_{2} \right)\ \ \ \forall\ \ \ x\ \ \ \ \  \Rightarrow \theta_{1} = \theta_{2}\]

c. Differentiability of the likelihood

The likelihood function, \(L(\theta)\) or the log likelihood function, \(l(\theta)\) is continuously differentiable with respect to Ө.

d. Existence of Fisher Information

The Fisher Information matrix, \(I\left( \theta_{0} \right)\) is finite and positive definite at \(\theta_{0}\).

e. Interchange of differentiation and expectation

It must be valid to differentiate under the integral,

\[\frac{\partial}{\partial\theta}\int_{}^{}{f(x;\theta)}dx = \int_{}^{}\frac{\partial}{\partial\theta}f(x;\theta)\ dx\]

f. Existence of third derivatives (for asymptotic normality)

For asymptotic expansion of the MLE (such as the Cramer-Rao lower bound and normality), the log-likelihood should have continuous third derivatives, i.e.,

\[\frac{\partial^{3}}{\partial\theta_{i}\partial\theta_{j}\partial\theta_{k}}l(\theta)\]

exist and is finite.

g. Dominance condition

There exists an integrable function \(M(x)\) such that for all \(\theta\) in a neighbourhood of \(\theta_{0}\), i.e.,

\[\left| \frac{\partial}{\partial\theta_{i}}l(\theta) \right| \leq M(x),\ \ \ \ \ \ E\left\lbrack M(x) \right\rbrack < \infty\]


For example,

For i.i.d. normal observations \(X_{i} \sim N(\mu,\sigma^{2})\), the MLEs are:

\[\widehat{\mu} = \frac{1}{n}\sum_{i = 1}^{n}X_{i}\] and \[{\widehat{\sigma}}^{2} = \frac{1}{n}\sum_{i = 1}^{n}\left( X_{i} - \widehat{\mu} \right)^{2}\]

These satisfy consistencyasymptotic normality, and efficiency as n → .



Suggested Readings:

  1. STATS 200. Lecture 15: Fisher information and theCramer-Rao bound. Stanford University.
  2. DISCDown. Maximum Likelihood Estimation.
  3. Econ 620. Maximum Likelihood Estimation (MLE). Cornell University.
  4. Taboga, M. (2021). Maximum likelihood estimationFundamentals of Statistics.

Suggested Citation: Meitei, W. B. (2025). Maximum Likelihood Estimation. WBM STATS.

No comments:

Post a Comment