by W. B. Meitei, PhD
Maximum likelihood estimation is a fundamental statistical method used to estimate the unknown parameters of a probability distribution based on observed data. Given a parametric family or probability density (or mass) functions \(f(x;\theta)\) with a parameter Ө ϵ Θ, the likelihood function for i.i.d. observations X1, X2, …, Xn is defined as,
\[L\left( \theta;X_{1},X_{2},\ldots,X_{n} \right) = \prod_{i = 1}^{n}{f(X_{i};\theta)}\]
Or equivalently, the log likelihood function is,
\[l(\theta) = \log\left\{ L(\theta) \right\} = \sum_{i = 1}^{n}{\log\left\{ f(X_{i};\theta) \right\}}\]
MLE chooses the value \({\widehat{\theta}}_{MLE}\) that maximises the likelihood or the log likelihood function, i.e.,
\[{\widehat{\theta}}_{MLE} = arg\max_{Ө\ \epsilon\ \Theta}{L\left( \theta;X_{1},X_{2},\ldots,X_{n} \right)} = arg\max_{Ө\ \epsilon\ \Theta}{l(\theta)}\]
Maximum likelihood estimation theorems
1. Consistency of MLE
Let \({\widehat{\theta}}_{n}\) denote the MLE based on n independent observations from a distribution with true parameter \(\theta_{0}\ \epsilon\ \Theta\). Under standard regularity conditions (smoothness, identifiability, and existence of moments),
\[\lim_{n \rightarrow \infty}{P\left\lbrack |{\widehat{\theta}}_{n} - \theta_{0}| \leq \varepsilon \right\rbrack = 1}\]
i.e., MLE is consistent as the sample size increases.
2. Asymptotic normality
Under the same regularity conditions, the MLE is asymptotically normal,
\[\sqrt{n}\left( {\widehat{\theta}}_{n} - \theta_{0} \right)\overset{asy}{\rightarrow}N\left\lbrack 0,I^{- 1}\left( \theta_{0} \right) \right\rbrack\]
Where, \(I\left( \theta_{0} \right)\) is the Fisher Information,
\[I\left( \theta_{0} \right) = {Var}_{\theta_{0}}\left\lbrack \frac{\partial}{\partial\theta}\log\left\{ f(X_{i};\theta_{0}) \right\} \right\rbrack = - E\left\lbrack \frac{\partial^{2}}{\partial\theta^{2}}\log\left\{ f(X_{i};\theta_{0}) \right\} \right\rbrack\]
This theorem underpins the construction of asymptotic confidence intervals and hypothesis test for MLEs.
3. Efficiency
Asymptotically, the MLE achieves the Cramer-Rao lower bound, meaning it is asymptotically efficient, i.e., the variance of \({\widehat{\theta}}_{n}\) approaches the minimum possible variance for unbiased estimators as n → ∞.
\[Var\left( {\widehat{\theta}}_{n} \right) \approx I^{- 1}\left( \theta_{0} \right)\]
Regularity conditions
a. True parameter interior condition
The true parameter \(\theta_{0}\) lies in the interior of the parameter space Θ, i.e. \(\theta_{0}\ \epsilon\ int(\Theta)\).
b. Identifiability
Different parameter values correspond to different probability distributions, i.e.,
\[f\left( x;\theta_{1} \right) = f\left( x;\theta_{2} \right)\ \ \ \forall\ \ \ x\ \ \ \ \ \Rightarrow \theta_{1} = \theta_{2}\]
c. Differentiability of the likelihood
The likelihood function, \(L(\theta)\) or the log likelihood function, \(l(\theta)\) is continuously differentiable with respect to Ө.
d. Existence of Fisher Information
The Fisher Information matrix, \(I\left( \theta_{0} \right)\) is finite and positive definite at \(\theta_{0}\).
e. Interchange of differentiation and expectation
It must be valid to differentiate under the integral,
\[\frac{\partial}{\partial\theta}\int_{}^{}{f(x;\theta)}dx = \int_{}^{}\frac{\partial}{\partial\theta}f(x;\theta)\ dx\]
f. Existence of third derivatives (for asymptotic normality)
For asymptotic expansion of the MLE (such as the Cramer-Rao lower bound and normality), the log-likelihood should have continuous third derivatives, i.e.,
\[\frac{\partial^{3}}{\partial\theta_{i}\partial\theta_{j}\partial\theta_{k}}l(\theta)\]
exist and is finite.
g. Dominance condition
There exists an integrable function \(M(x)\) such that for all \(\theta\) in a neighbourhood of \(\theta_{0}\), i.e.,
\[\left| \frac{\partial}{\partial\theta_{i}}l(\theta) \right| \leq M(x),\ \ \ \ \ \ E\left\lbrack M(x) \right\rbrack < \infty\]
For example,
For i.i.d. normal observations \(X_{i} \sim N(\mu,\sigma^{2})\), the MLEs are:
\[\widehat{\mu} = \frac{1}{n}\sum_{i = 1}^{n}X_{i}\] and \[{\widehat{\sigma}}^{2} = \frac{1}{n}\sum_{i = 1}^{n}\left( X_{i} - \widehat{\mu} \right)^{2}\]
These satisfy consistency, asymptotic normality, and efficiency as n → ∞.
Suggested Readings:
- STATS 200. Lecture 15: Fisher information and theCramer-Rao bound. Stanford University.
- DISCDown. Maximum Likelihood Estimation.
- Econ 620. Maximum Likelihood Estimation (MLE). Cornell University.
- Taboga, M. (2021). Maximum likelihood estimation. Fundamentals of Statistics.
Suggested Citation: Meitei, W. B. (2025). Maximum Likelihood Estimation. WBM STATS.
No comments:
Post a Comment