by W. B. Meitei, PhD
Maximum likelihood estimation is a fundamental
statistical method for estimating the unknown parameters of a probability distribution from observed data. Given a parametric family or probability
density (or mass) functions \(f(x;\theta)\) with a parameter Ө ϵ Θ, the likelihood
function for i.i.d. observations X1, X2, …,
Xn is defined as,
\[L\left( \theta;X_{1},X_{2},\ldots,X_{n} \right) = \prod_{i = 1}^{n}{f(X_{i};\theta)}\]
Or equivalently, the log likelihood function is,
\[l(\theta) = \log\left\{ L(\theta) \right\} = \sum_{i = 1}^{n}{\log\left\{ f(X_{i};\theta) \right\}}\]
MLE chooses the value \({\widehat{\theta}}_{MLE}\) that maximizes the likelihood or the log
likelihood function, i.e.,
\[{\widehat{\theta}}_{MLE} = arg\max_{Ө\ \epsilon\ \Theta}{L\left( \theta;X_{1},X_{2},\ldots,X_{n} \right)} = arg\max_{Ө\ \epsilon\ \Theta}{l(\theta)}\]
Maximum likelihood estimation theorems
1. Consistency of MLE
Let \({\widehat{\theta}}_{n}\) denote the MLE based on n independent
observations from a distribution with true parameter \(\theta_{0}\ \epsilon\ \Theta\).
Under standard regularity conditions (smoothness, identifiability, and
existence of moments),
\[\lim_{n \rightarrow \infty}{P\left\lbrack |{\widehat{\theta}}_{n} - \theta_{0}| \leq \varepsilon \right\rbrack = 1}\]
i.e., MLE is consistent as the sample size increases.
2. Asymptotic normality
Under the same regularity conditions, the MLE is
asymptotically normal,
\[\sqrt{n}\left( {\widehat{\theta}}_{n} - \theta_{0} \right)\overset{asy}{\rightarrow}N\left\lbrack 0,I^{- 1}\left( \theta_{0} \right) \right\rbrack\]
Where, \(I\left( \theta_{0} \right)\) is the Fisher Information,
\[I\left( \theta_{0} \right) = {Var}_{\theta_{0}}\left\lbrack \frac{\partial}{\partial\theta}\log\left\{ f(X_{i};\theta_{0}) \right\} \right\rbrack = - E\left\lbrack \frac{\partial^{2}}{\partial\theta^{2}}\log\left\{ f(X_{i};\theta_{0}) \right\} \right\rbrack\]
This theorem underpins the construction of asymptotic confidence intervals and hypothesis tests for MLEs.
3. Efficiency
Asymptotically, the MLE achieves the Cramer-Rao
lower bound, meaning it is asymptotically efficient, i.e., the variance of \({\widehat{\theta}}_{n}\) approaches the minimum possible variance for
unbiased estimators as n → ∞.
\[Var\left( {\widehat{\theta}}_{n} \right) \approx I^{- 1}\left( \theta_{0} \right)\]
Regularity conditions
a. True parameter interior condition: The true parameter \(\theta_{0}\) lies in the interior of the parameter space Θ,
i.e. \(\theta_{0}\ \epsilon\ int(\Theta)\).
b. Identifiability: Different parameter values correspond to different
probability distributions, i.e.,
\[f\left( x;\theta_{1} \right) = f\left( x;\theta_{2} \right)\ \ \ \forall\ \ \ x\ \ \ \ \ \Rightarrow \theta_{1} = \theta_{2}\]
c. Differentiability of the likelihood: The likelihood function, \(L(\theta)\) or the log likelihood function, \(l(\theta)\) is continuously differentiable with respect to
Ө.
d. Existence of Fisher Information: The Fisher Information matrix, \(I\left( \theta_{0} \right)\) is finite and positive definite at \(\theta_{0}\).
e. Interchange of differentiation and expectation: It must be valid to differentiate under the integral,
\[\frac{\partial}{\partial\theta}\int_{}^{}{f(x;\theta)}dx = \int_{}^{}\frac{\partial}{\partial\theta}f(x;\theta)\ dx\]
f. Existence of third derivatives (for asymptotic
normality): For asymptotic expansion of the MLE (such as the
Cramer-Rao lower bound and normality), the log-likelihood should have
continuous third derivatives, i.e.,
\[\frac{\partial^{3}}{\partial\theta_{i}\partial\theta_{j}\partial\theta_{k}}l(\theta)\]
exist and is finite.
g. Dominance condition: There exists an integrable function \(M(x)\) such that for all \(\theta\) in a neighbourhood of \(\theta_{0}\), i.e.,
\[\left| \frac{\partial}{\partial\theta_{i}}l(\theta) \right| \leq M(x),\ \ \ \ \ \ E\left\lbrack M(x) \right\rbrack < \infty\]
For example,
For i.i.d. normal observations \(X_{i} \sim N(\mu,\sigma^{2})\), the MLEs are:
\[\widehat{\mu} = \frac{1}{n}\sum_{i = 1}^{n}X_{i}\] and \[{\widehat{\sigma}}^{2} = \frac{1}{n}\sum_{i = 1}^{n}\left( X_{i} - \widehat{\mu} \right)^{2}\]
These satisfy consistency, asymptotic normality, and efficiency
as n → ∞.
Suggested Readings:
- STATS 200. Lecture 15: Fisher information and the Cramer-Rao bound. Stanford University.
- DISCDown. Maximum Likelihood Estimation.
- Econ 620. Maximum Likelihood Estimation (MLE). Cornell University.
- Taboga, M. (2021). Maximum likelihood estimation. Fundamentals of Statistics.
Suggested Citation: Meitei, W. B. (2025). Maximum Likelihood Estimation. WBM STATS.