Predictive vs Bayesian Predictive Distribution

center

Mathematical Definition for a single prediction $t$

1. Bayesian Predictive Distribution

The Bayesian approach¹(left plot) shows wider uncertainty bands because it accounts for parameter uncertainty on top of data noise. The uncertainty also varies spatially - being larger where there’s less data.

p (t ∣ x, X, T) = \int p (t ∣ x, w) p (w ∣ X, T) d w = N (t ∣ m (x), s^{2} (x))

Data-dependent mean: $m (x)$
Input-dependent variance: $s^{2} (x)$

Note: The weight $w$ is not explicitly included in the parameters on the LHS $(p (t ∣ x, X, T))$ of the Bayesian predictive distribution because it has been integrated out.

2. Predictive Distribution (Maximum Likelihood)

The Predictive Distribution uses a single “best” set of parameters found in the data.

p (t ∣ x, w_{M L}, β_{M L}) = N (t ∣ y (x, w_{M L}), β_{M L}^{- 1})

Fixed mean: $y (x, w_{M L})$
Fixed variance: $β_{M L}^{- 1}$ (constant for all inputs)

Summary of key differences

Predictive Distribution (Maximum Likelihood):

Uses fixed parameters $w_{M L}$ and $β_{M L}$ (point estimates)
Uncertainty comes only from data noise (aleatoric uncertainty)
Constant variance across all inputs

Bayesian Predictive Distribution:

Integrates over all possible parameters weighted by their posterior probability
Uncertainty comes from both data noise AND parameter uncertainty (aleatoric + epistemic)
Input-dependent variance - uncertainty varies with location

Help with the notation: $p (t ∣ x, X, T)$ is the probability distribution of a new, unseen output $t$ , conditioned on the new input $x$ and all the evidence from the training data, $X$ and $T$ . ↩

Ashu's Online Notes

Explorer

Predictive vs Bayesian Predictive Distribution

Mathematical Definition for a single prediction $t$

1. Bayesian Predictive Distribution

2. Predictive Distribution (Maximum Likelihood)

Summary of key differences

Graph View

Table of Contents

Backlinks

Ashu's Online Notes

Explorer

Predictive vs Bayesian Predictive Distribution

Mathematical Definition for a single prediction t

1. Bayesian Predictive Distribution

2. Predictive Distribution (Maximum Likelihood)

Summary of key differences

Footnotes

Graph View

Table of Contents

Backlinks

Mathematical Definition for a single prediction $t$