Question 1

What is gradient descent?

Accepted Answer

Gradient descent is an optimization algorithm that iteratively adjusts parameters to minimize a loss function. At each step, it computes the gradient (slope) of the loss with respect to each parameter and moves in the opposite direction — downhill. The learning rate controls the step size. It is the fundamental training algorithm for virtually all neural networks.

Question 2

What is the learning rate?

Accepted Answer

The learning rate controls how large each parameter update is. Too small, and training takes forever; too large, and the optimizer overshoots the minimum and may diverge. Finding the right learning rate is one of the most important hyperparameter choices in deep learning. Modern techniques like learning rate warmup and cosine annealing adjust it dynamically.

Question 3

What is momentum in gradient descent?

Accepted Answer

Momentum adds a fraction of the previous update to the current one, like a ball rolling downhill that accumulates speed. It helps gradient descent move faster through flat regions and dampen oscillations in narrow valleys. The momentum coefficient (typically 0.9) controls how much past gradients influence the current step.

Question 4

What is the difference between GD, SGD, and Adam?

Accepted Answer

Batch GD computes gradients over the entire dataset — accurate but slow. SGD (stochastic gradient descent) uses random mini-batches, adding noise that can help escape local minima. Adam combines momentum with per-parameter adaptive learning rates, making it robust to different loss surface geometries and the default choice for many deep learning tasks.

Gradient Descent: How Neural Networks Learn

Formula

Descending the Loss Landscape

Learning Rate: The Step Size Dilemma

Momentum and Acceleration

Modern Optimizers

FAQ

Sources

Embed

Gradient Descent: How Neural Networks Learn

Formula

Descending the Loss Landscape

Learning Rate: The Step Size Dilemma

Momentum and Acceleration

Modern Optimizers

FAQ

Sources

Other simulations: Machine Learning & AI Algorithms

K-Means Clustering Visualizer

Decision Tree Classifier Visualizer

Neural Network Forward Pass Visualizer

Overfitting & Bias-Variance Tradeoff

Embed