Gradient Descent Algorithm

Use-case

It is used to find the smallest cost function for a regression. So that the regression gives a fairly accurate approximation of the data set.

θ_{j} := θ_{j} - α \frac{\partial}{\partial θ _{j}} J (θ_{0}, θ_{1}) (for j = 0, j = 1)

Theory

Hypothesis Function

h_{θ} (x) = θ_{1} x + θ_{0}

Where,

$(θ_{0}, θ_{1}) \to$ Parameters guessed by the Hypothesis function through regression
$x \to$ Domain

Example

center Cost/Loss Function: Mean Squared Error

J (θ_{0}, θ_{1}) = \frac{1}{m} i = 1 \sum m [h_{θ} (x (i) - y (i))^{2}]

Where, $m \to$ Number of data points.

The cost function is computed for all the guess values can be visualized as follows. Visually, the point with the lowest cost function can be identified. However, it is not feasible to map the cost function to every possible guess value of $θ_{1}, θ_{0}$ . center Gradient algorithm is then used to find a path to a minima of the cost function.

θ_{j} := θ_{j} - α \frac{\partial}{\partial θ _{j}} J (θ_{0}, θ_{1}) (for j = 0, j = 1)

This gives the smallest value of $θ_{j}$ for a given $α$ because we know that the gradient points to the direction of the steepest slope. Where $α \to$ step size $\to$ learning rate

θ_{0} := θ_{0} + \nabla θ_{0} \to - α \frac{\partial}{\partial θ _{0}} J (θ_{0}, θ_{1}) θ_{1} := θ_{1} + \nabla θ_{1} \to - α \frac{\partial}{\partial θ _{1}} J (θ_{0}, θ_{1})

Repeat until convergence

Ashu's Online Notes

Explorer

Gradient Descent Algorithm

Use-case

Theory

Example

Graph View

Table of Contents