Bayesian Discriminant Function (Gaussian)

Decision Surface

For a classifier where the likelihood for each class is a multivariate Gaussian distribution, the decision surfaces are generally hyperquadrics. A hyperquadric is a generalization of conic sections (like ellipses, parabolas, and hyperbolas) to higher dimensions. This means the boundaries separating the classes can be hyperplanes, pairs of hyperplanes, hyperspheres, hyperellipsoids, or hyperparaboloids.

Discriminant Function

Assuming a minimum-error-rate classification and Gaussian likelihoods $p (x ∣ s = i) \sim N (x; μ_{i}, Σ_{i})$ , the logarithmic discriminant function is:

g_{i} (x) = ln p (x ∣ s = i) + ln P (s = i)

Expanding the Gaussian term, we get:

g_{i} (x) = - \frac{1}{2} (x - μ_{i})^{T} Σ_{i}^{- 1} (x - μ_{i}) - \frac{d}{2} ln (2 π) - \frac{1}{2} ln (∣ Σ_{i} ∣) + ln P (s = i)

The term $(x - μ_{i})^{T} Σ_{i}^{- 1} (x - μ_{i})$ is a quadratic form in $x$ , which is why the resulting decision boundaries $g_{i} (x) = g_{j} (x)$ are hyperquadrics. The term $- \frac{d}{2} ln (2 π)$ is a constant across all classes and can be ignored.

What happens to the decision surface of a Gaussian classifier in the special case where all covariance matrices are equal and diagonal, i.e., $Σ_{i} = σ^{2} I$ ?

Special Case: Covariance matrices are equal and diagonal

When the covariance matrix for every class $i$ is assumed to be $Σ_{i} = σ^{2} I$ , it means the features are statistically independent and have the same variance $σ^{2}$ for all classes. In this case, the quadratic term in the discriminant function simplifies, and the decision surface becomes a hyperplane. The discriminant function simplifies to:

g_{i} (x) = - \frac{∣∣ x - μ _{i} ∣ ∣ ^{2}}{2 σ ^{2}} + ln P (s = i)

where $∣∣ x - μ_{i} ∣ ∣^{2}$ is the squared Euclidean distance. The quadratic terms $x^{T} x$ cancel out when comparing $g_{i} (x)$ and $g_{j} (x)$ , leaving a linear equation in $x$ , which defines a hyperplane.

Special Case: Covariance matrices are equal and diagonal AND a flat prior?

If we have the special case of equal diagonal covariances ( $Σ_{i} = σ^{2} I$ ) and a flat prior ( $P (s = i) = \frac{1}{N}$ for all classes), the discriminant function simplifies even further to:

g_{i} (x) = - ∣∣ x - μ_{i} ∣ ∣^{2}

Maximizing this $g_{i} (x)$ is equivalent to minimizing the Euclidean distance $∣∣ x - μ_{i} ∣∣$ . Therefore, the classifier simply assigns a feature vector $x$ to the class with the nearest mean $μ_{i}$ . This is known as a minimum Euclidean distance classifier.

Ashu's Online Notes

Explorer

Bayesian Discriminant Function (Gaussian)

Decision Surface

Discriminant Function

Special Case: Covariance matrices are equal and diagonal

Special Case: Covariance matrices are equal and diagonal AND a flat prior?

Graph View

Table of Contents

Backlinks