Bias-Variance decomposition

Expected Squared test error

E_{D, ν} [(t - y (x, w))^{2}] = bias_{D} (y)^{2} + model variance Var_{D} (y) + noise σ^{2}

The expectation $E_{D, ν} [\cdot]$ is taken over both:

Different possible training datasets D (sampling variability)
Different realizations of the noise ν

Note

Model variance = $Var_{D} (y)$ represents how much predictions vary with different training sets
Noise Recall $t = f (x) + ν$
- $f (x)$ is the true underlying function
- $ν \sim N (0, σ^{2})$ is Gaussian noise with zero mean and variance $σ^{2}$

Calculating Individual terms

Bias² = $(f - E_{D} [y])^{2}$
- How far off your average prediction is from the truth
- High bias = consistently wrong in the same direction (underfitting)
- Like a rifle that always shoots left of target
- $f$ is the true function
Variance = $E_{D} [(E_{D} [y] - y)^{2}]$
- High variance = predictions change wildly with different training sets
- Like a rifle with shaky aim - shots scattered everywhere
Noise = $σ^{2}$
- Irreducible error in the data itself
- Measurement errors, random fluctuations
- Can’t be eliminated no matter how good your model

The Tradeoff

Simple models: High bias, low variance
Complex models: Low bias, high variance
Goal: Find optimal complexity that minimizes total error

Key Insight: You cannot reduce bias and variance simultaneously - there’s always a tradeoff. The art of machine learning is finding the right balance.

Ashu's Online Notes

Explorer

Bias-Variance decomposition

Expected Squared test error

Calculating Individual terms

The Tradeoff

Graph View

Table of Contents

Backlinks