Gradient Descent

center The learning rule for gradient descent is

w_{k + 1} = w_{k} - η \nabla E (w)

Where:

$w_{k}$ is the parameter vector at iteration $k$
$w_{k + 1}$ is the updated parameter vector for the next iteration
$η$ is the learning rate (a positive scalar value)
$\nabla E (w)$ is the gradient of the error function with respect to parameters $w$

This iterative update rule moves the parameters in the direction of steepest descent of the error function, with the step size controlled by the learning rate $η$ . The process continues until the gradient becomes approximately zero ( $\nabla E (w) \approx 0$ ) or some other stopping criterion is met.

Ashu's Online Notes

Explorer

Gradient Descent

Graph View

Backlinks