Parameter Estimation for Maximum Likelihood (Multivariate Gausssian)

Maximum Likelihood (ML) Parameter Estimation

The maximum likelihood estimate finds parameters $θ$ that maximize the likelihood of observing the training data:

\hat{θ}^{M L} = ar g θ max L (θ) = ar g θ max p (O∣ θ)

In practice, we maximize the log-likelihood $LL (θ)$

\hat{θ}^{(ML)} = ar g θ max LL (θ) = ar g θ max τ = 1 \sum T lo g p (x = o_{τ} ∣ θ)

To find the parameters that maximize the likelihood of observing the training data, one must find the maximum of the log-likelihood function, $LL (θ)$ . This is typically done by taking the partial derivative with respect to each parameter, setting it to zero, and solving the resulting system of equations.

The formula is:

\frac{\partial}{\partial θ _{r}} LL (θ) = τ = 1 \sum T \frac{\partial}{\partial θ _{r}} lo g p (x = o_{τ} ∣ θ) =! 0, for r = 1, 2, ..., R

$\hat{θ}^{(M L)}$ : The parameter vector that maximizes the likelihood.
$LL (θ)$ : The log-likelihood function.
$o_{τ}$ : The training data vectors.
$θ_{r}$ : The individual parameters to be estimated.
$T$ : Total training samples in the dataset
$T$ : Total parameters in the model (that need to be estimated)

Note: $=! 0$ This notation means “set equal to zero”. We set the derivative to zero to find the critical points (maxima, minima, or saddle points) of the log-likelihood function. In this context, we are looking for the maximum.

Why Parameter Estimation is Needed

In pattern recognition, we need parameter estimation to train classifiers by determining the unknown parameters of probability distributions from training data. They are:

Prior probabilities $P (s = i)$
Likelihood probability density functions $p (x ∣ s = i)$

Assuming that the Likelihood can by represented by a Gaussian, we find mean $μ_{i}$ and covariance $Σ_{i}$ such that it closely represents the Likelihood. The ML estimates for the parameters are estimated as follows:

Sample Mean (unbiased): This is the average of all the training vectors.
$\overset{μ}{^} = \frac{1}{T} τ = 1 \sum T o_{τ}$
Sample Covariance (unbiased): This is the average of the outer products of the centered data vectors.
$\hat{Σ} = \frac{1}{T - 1} τ = 1 \sum T (o_{τ} - \overset{μ}{^}) (o_{τ} - \overset{μ}{^})^{T}$
Note: Using a denominator of T-1 provides an unbiased estimate.

Ashu's Online Notes

Explorer

Parameter Estimation for Maximum Likelihood (Multivariate Gausssian)

Maximum Likelihood (ML) Parameter Estimation

Why Parameter Estimation is Needed

Graph View

Table of Contents

Backlinks