Maximum Likelihood (ML) Parameter Estimation
The maximum likelihood estimate finds parameters that maximize the likelihood of observing the training data:
In practice, we maximize the log-likelihood:
To find the parameters that maximize the likelihood of observing the training data, one must find the maximum of the log-likelihood function, . This is typically done by taking the partial derivative with respect to each parameter, setting it to zero, and solving the resulting system of equations.
The formula is:
- : The parameter vector that maximizes the likelihood.
- : The log-likelihood function.
- : The training data vectors.
- : The individual parameters to be estimated.
- : Total training samples in the dataset
- : Total parameters in the model (that need to be estimated)
Note: This notation means “set equal to zero”. We set the derivative to zero to find the critical points (maxima, minima, or saddle points) of the log-likelihood function. In this context, we are looking for the maximum.
Why Parameter Estimation is Needed
In pattern recognition, we need parameter estimation to train classifiers by determining the unknown parameters of probability distributions from training data. (we assume that the likelihood functions can be represented with a known distribution.
Specifically, we need to estimate:
- Prior probabilities
- Likelihood probability density functions
While priors are often simple to estimate, determining the likelihood pdfs is more complex. In parametric estimation, we assume the likelihood function follows a known mathematical form (e.g., a Gaussian distribution), but its specific parameters (like the mean and covariance ) are unknown.
Therefore, parameter estimation is the process of using the training data to calculate these unknown parameters. Once estimated, these parameters define the likelihood functions, enabling the Bayesian classifier to compute the necessary posterior probabilities and make informed decisions about class membership.