Expected Squared test error
The expectation is taken over both:
- Different possible training datasets D (sampling variability)
- Different realizations of the noise ν
Note
- Model variance = represents how much predictions vary with different training sets
- Noise Recall
- is the true underlying function
- is Gaussian noise with zero mean and variance
Calculating Individual terms
- Bias² =
- How far off your average prediction is from the truth
- High bias = consistently wrong in the same direction (underfitting)
- Like a rifle that always shoots left of target
- is the true function
- Variance =
- High variance = predictions change wildly with different training sets
- Like a rifle with shaky aim - shots scattered everywhere
- Noise =
- Irreducible error in the data itself
- Measurement errors, random fluctuations
- Can’t be eliminated no matter how good your model
The Tradeoff
- Simple models: High bias, low variance
- Complex models: Low bias, high variance
- Goal: Find optimal complexity that minimizes total error
Key Insight: You cannot reduce bias and variance simultaneously - there’s always a tradeoff. The art of machine learning is finding the right balance.