Neural Networks: How Layers of Neurons Transform Data

simulator intermediate ~12 min
Loading simulation...
Output = 0.73 — 2 hidden layers, 26 parameters

A network with 2 inputs, two hidden layers of 4 ReLU neurons each, and 1 output has 26 trainable parameters. With input signal 1.0, the forward pass produces an output of approximately 0.73.

Formula

Neuron output: y = σ(W·x + b)
ReLU: σ(z) = max(0, z)
Parameter count: Σ (n_in × n_out + n_out) per layer

Layers of Computation

A neural network transforms input data through successive layers of neurons. Each neuron computes a weighted sum of its inputs, adds a bias term, and applies a nonlinear activation function. The first layer extracts simple features; deeper layers combine these into increasingly abstract representations. A face recognition network might detect edges in layer 1, eyes and noses in layer 3, and complete faces in layer 5. This hierarchical feature learning is what makes deep networks so powerful.

Activation Functions: The Source of Power

Without activation functions, a neural network would just be a series of matrix multiplications — equivalent to a single linear transformation no matter how many layers. Nonlinear activations like ReLU, sigmoid, and tanh give networks the ability to model curved decision boundaries and complex functions. The simulation lets you switch between activations and see how they change the network's behavior — sigmoid squashes everything to [0,1], tanh to [-1,1], while ReLU passes positive values unchanged and zeros out negatives.

The Forward Pass

The forward pass is the computation that transforms input to output. Data flows from the input layer through hidden layers to the output. At each layer, the operation is: multiply inputs by weights, add biases, apply activation. This simulation visualizes the process — watch activation values flow through connections, with line thickness representing weight magnitude and color representing positive (cyan) or negative (red) values.

Counting Parameters

A network's capacity is largely determined by its parameter count — the total number of weights and biases. Each connection between neurons is one weight; each neuron has one bias. A fully connected layer with n inputs and m outputs has n×m + m parameters. Modern language models have billions of parameters, but even small networks with a few hundred parameters can learn surprisingly complex functions, as this simulation demonstrates.

FAQ

What is a neural network?

A neural network is a computation model inspired by biological neurons. It consists of layers of interconnected nodes — each node computes a weighted sum of its inputs, adds a bias, and passes the result through a nonlinear activation function. By stacking layers, networks can learn to approximate virtually any function, from image classification to language generation.

What is an activation function?

An activation function introduces nonlinearity into the network. Without it, stacking linear layers would just produce another linear function. Common activations include ReLU (max(0, x) — simple and effective), sigmoid (squashes to 0-1 — used for probabilities), and tanh (squashes to -1 to 1). The choice affects training dynamics and representational capacity.

How many layers does a neural network need?

The universal approximation theorem proves a single hidden layer with enough neurons can approximate any continuous function. However, deep networks (many layers) can represent the same functions with exponentially fewer neurons. Modern networks (GPT, ResNet) use dozens to hundreds of layers, leveraging depth for efficient hierarchical feature learning.

What is the vanishing gradient problem?

In deep networks, gradients used for training must propagate backward through every layer. With sigmoid or tanh activations, each layer multiplies gradients by values less than 1, causing them to shrink exponentially — early layers barely learn. ReLU activations, skip connections (ResNet), and normalization layers (BatchNorm) were key innovations that solved this and enabled training of very deep networks.

Sources

Embed

<iframe src="https://homo-deus.com/lab/machine-learning/neural-network/embed" width="100%" height="400" frameborder="0"></iframe>
View source on GitHub