Question 1

What is gradient descent?

Accepted Answer

Gradient descent is an optimization algorithm that finds the minimum of a function by iteratively moving in the direction of steepest decrease (negative gradient). Each step updates the parameters: θ = θ - α∇f(θ), where α is the learning rate and ∇f is the gradient.

Question 2

Why is the learning rate so important?

Accepted Answer

The learning rate controls step size. Too small: convergence is painfully slow. Too large: the optimizer overshoots the minimum and may diverge entirely. Finding the right learning rate is one of the most critical hyperparameter choices in deep learning.

Question 3

What is the difference between gradient descent, SGD, and Adam?

Accepted Answer

Gradient descent uses all data per step. Stochastic gradient descent (SGD) uses one random sample. Mini-batch SGD uses a small batch. Adam combines momentum and adaptive learning rates per parameter. Adam is the default optimizer for most deep learning today.

Question 4

Can gradient descent get stuck in local minima?

Accepted Answer

Yes. In non-convex landscapes, gradient descent can converge to local minima or saddle points instead of the global minimum. Momentum, learning rate schedules, and stochastic noise help escape poor local optima.

Gradient Descent: How Machine Learning Models Learn

Formula

The Engine of Machine Learning

The Loss Landscape

The Learning Rate Dilemma

Beyond Vanilla Gradient Descent

FAQ

Sources

Embed

Gradient Descent: How Machine Learning Models Learn

Formula

The Engine of Machine Learning

The Loss Landscape

The Learning Rate Dilemma

Beyond Vanilla Gradient Descent

FAQ

Sources

Other simulations: Data Science & Machine Learning

Decision Tree Classifier

K-Means Clustering Algorithm

Linear Regression & Least Squares Fit

PCA Dimensionality Reduction

Embed