Question 1

What is a neural network?

Accepted Answer

A neural network is a computation model inspired by biological neurons. It consists of layers of interconnected nodes — each node computes a weighted sum of its inputs, adds a bias, and passes the result through a nonlinear activation function. By stacking layers, networks can learn to approximate virtually any function, from image classification to language generation.

Question 2

What is an activation function?

Accepted Answer

An activation function introduces nonlinearity into the network. Without it, stacking linear layers would just produce another linear function. Common activations include ReLU (max(0, x) — simple and effective), sigmoid (squashes to 0-1 — used for probabilities), and tanh (squashes to -1 to 1). The choice affects training dynamics and representational capacity.

Question 3

How many layers does a neural network need?

Accepted Answer

The universal approximation theorem proves a single hidden layer with enough neurons can approximate any continuous function. However, deep networks (many layers) can represent the same functions with exponentially fewer neurons. Modern networks (GPT, ResNet) use dozens to hundreds of layers, leveraging depth for efficient hierarchical feature learning.

Question 4

What is the vanishing gradient problem?

Accepted Answer

In deep networks, gradients used for training must propagate backward through every layer. With sigmoid or tanh activations, each layer multiplies gradients by values less than 1, causing them to shrink exponentially — early layers barely learn. ReLU activations, skip connections (ResNet), and normalization layers (BatchNorm) were key innovations that solved this and enabled training of very deep networks.

Neural Networks: How Layers of Neurons Transform Data

Formula

Layers of Computation

Activation Functions: The Source of Power

The Forward Pass

Counting Parameters

FAQ

Sources

Embed

Neural Networks: How Layers of Neurons Transform Data

Formula

Layers of Computation

Activation Functions: The Source of Power

The Forward Pass

Counting Parameters

FAQ

Sources

Other simulations: Machine Learning & AI Algorithms

K-Means Clustering Visualizer

Decision Tree Classifier Visualizer

Gradient Descent Optimizer Visualizer

Overfitting & Bias-Variance Tradeoff

Embed