Direct Maximum Likelihood Approach

The Direct Maximum Likelihood Approach for Labels is a method to directly map data to class labels. This approach requires labeled data, denoted as $(x_{n}, t_{n})$ where $t_{n}$ is the class label. It operates under the assumption that the class-conditional probability $p (x ∣ C_{k})$ , is normally distributed, and for a two-class problem, the labels are:

t_{n} \in {0, 1} .

Maximizing Likelihood

The objective is to find the model parameters ( $π, μ_{1}, μ_{2}, Σ)$ that maximize the label-data-likelihood function, which is given as:

L (π, μ_{1}, μ_{2}, Σ) = n \prod [π N (x ∣ μ_{1}, Σ)]^{t_{n}} [(1 - π) N (x ∣ μ_{2}, Σ)]^{1 - t_{n}}

Computational Tricks

Instead of maximizing the likelihood function directly, we maximize the logarithm. We also remove all parameters that don’t depend on the specific parameter that’s being optimized. (Valid because these terms will become zero during differentiation).

For instance, to find the optimal parameter $π$ ,

ar g π max n \sum (t_{n} ln (π) + (1 - t_{n}) ln (1 - π))

Where,

$π$ is the proportion of data belonging to the first class (prior probability for class $C_{1}$ or $P (C_{1})$ )
$π = \frac{1}{N} n \sum t_{n} = \frac{N _{1}}{N}$
$N$ is the total number of data points
$N_{1}$ is the total count of data points in class $C_{1}$
$t_{n}$ is the label for the $n -$ th data point, $x_{n}$ . (One of K Encoding Scheme)

Example: if $40$ out of $100$ data points in your training set belong to class $C_{1}$ , the formula gives you: $π = \frac{100}{40} = 0.4$

Ashu's Online Notes

Explorer

Direct Maximum Likelihood Approach

Maximizing Likelihood

Computational Tricks

Graph View

Table of Contents

Backlinks