Expected Squared test error

The expectation is taken over both:

  1. Different possible training datasets D (sampling variability)
  2. Different realizations of the noise ν

Note

  • Model variance = represents how much predictions vary with different training sets
  • Noise Recall
    • is the true underlying function
    • is Gaussian noise with zero mean and variance

Calculating Individual terms

  1. Bias² =
    • How far off your average prediction is from the truth
    • High bias = consistently wrong in the same direction (underfitting)
    • Like a rifle that always shoots left of target
    • is the true function
  2. Variance =
    • High variance = predictions change wildly with different training sets
    • Like a rifle with shaky aim - shots scattered everywhere
  3. Noise =
    • Irreducible error in the data itself
    • Measurement errors, random fluctuations
    • Can’t be eliminated no matter how good your model

The Tradeoff

  • Simple models: High bias, low variance
  • Complex models: Low bias, high variance
  • Goal: Find optimal complexity that minimizes total error

Key Insight: You cannot reduce bias and variance simultaneously - there’s always a tradeoff. The art of machine learning is finding the right balance.