Why gradient descent algorithm is required in linear regression. List the difference between stochastic gradient descent and batch gradient descent with suitable cost function
Question
Why gradient descent algorithm is required in linear regression. List the difference between stochastic gradient descent and batch gradient descent with suitable cost function
Solution
Gradient Descent Algorithm in Linear Regression:
Linear regression is a method to predict a dependent variable value (y) based on the value of an independent variable (x). It is assumed that the two variables are linearly related. Hence, we try to find a linear function that predicts the response value(y) as accurately as possible as a function of the feature or independent variable(x).
The goal of linear regression is to find the best fit line which minimizes the sum of the squared differences between the actual and the predicted values. This is where the Gradient Descent algorithm comes in. It is an iterative optimization algorithm used in machine learning to find the best results (minima of a curve).
Gradient Descent measures the local gradient of the error function with regards to the parameter vector θ, and it goes in the direction of descending gradient. Once the gradient is zero, you have reached a minimum.
Differences between Stochastic Gradient Descent and Batch Gradient Descent:
-
Batch Gradient Descent: In Batch Gradient Descent, all the training data is taken into consideration to take a single step. We take the average of the gradients of all the training examples and then use that mean gradient to update our parameters. So that’s just one step of gradient descent in one epoch. Batch Gradient Descent can converge to a global minimum for convex error surfaces and to a local minimum for non-convex surfaces.
-
Stochastic Gradient Descent: In Stochastic Gradient Descent (SGD), on the other hand, we use only one training example to take a single step. We take one example, calculate the gradient of the error of our model on that single example, and use this to update our parameters. The process is repeated for the number of training examples. If the data is massive, SGD can be faster than batch gradient descent since it reaches convergence faster through noisy steps.
Suitable Cost Function:
The cost function for both SGD and Batch Gradient Descent is the Mean Squared Error function. It measures the average squared difference between the actual and the predicted values. It is suitable because it penalizes even a small error which leads to a better predictive model.
Similar Questions
1. Mention the advantages of Stochastic gradient descent.
1 pointWhat is the purpose of the gradient descent algorithm in machine learning? To minimize the loss function To maximize the loss function To minimize the output function To maximize the output function
What is correct about stochastic gradient descent? (select all that apply)1 pointThe loss may exhibit sudden increases The loss must be linearIt's an approximation of batch gradient descent
Gradient Descent is an optimization algorithm used for ______
Stochastic gradient descent has fewer amount of computation per gradient update than standard gradient descent.*TrueFalse
Upgrade your grade with Knowee
Get personalized homework help. Review tough concepts in more detail, or go deeper into your topic by exploring other relevant questions.