The Nadaraya-Watson kernel regression estimator uses kernel functions for weighting and is defined by the following formula

The key parameter that has to be chosen is the bandwidth of the kernel function, (often a Gaussian kernel). This parameter controls the width of the kernel and thus the smoothness of the resulting function.

  • Final predicted output value for the new, unseen input point .
  • query point, which is the new input for which you want to make a prediction.
  • : This is the kernel function. It measures the similarity or “closeness” between a training point and the new query point . The result is a scalar value that is typically large when the points are close and small when they are far apart. A common choice is the Gaussian kernel.
  • : This entire fraction acts as a normalized weight. The numerator is the similarity of a single training point, and the denominator is the sum of similarities over all training points. This ensures all the weights sum to 1. The weight assigned to a training output is directly proportional to the similarity between its corresponding input and the query point .