Linear Regression: Fitting Lines to Data with Least Squares

simulator beginner ~7 min
Loading simulation...
R² ≈ 0.82 — good linear fit

With 50 data points and moderate noise, the least squares regression line captures approximately 82% of the variance. The estimated slope closely approximates the true underlying relationship.

Formula

β₁ = Σ(xᵢ - x̄)(yᵢ - ȳ) / Σ(xᵢ - x̄)²
β₀ = ȳ - β₁x̄
R² = 1 - Σ(yᵢ - ŷᵢ)² / Σ(yᵢ - ȳ)²

The Foundation of Predictive Modeling

Linear regression is the workhorse of statistics and the gateway to machine learning. Sir Francis Galton invented it in 1886 while studying how children's heights 'regressed toward mediocrity' compared to their parents. Today, linear regression underlies everything from economic forecasting to A/B testing to medical research. If you understand one statistical model, it should be this one.

The Least Squares Method

The idea is elegant: find the line that minimizes the total squared distance between each data point and the line. Why squared? Because squaring penalizes large errors more than small ones, produces a unique solution with a closed-form formula, and connects beautifully to the mathematics of calculus and linear algebra. The resulting formulas for slope and intercept can be computed by hand or by any computer in microseconds.

Reading the Regression Output

R² tells you how well the line fits — 0.82 means 82% of the data's variation is explained by the linear relationship. RMSE (root mean squared error) tells you the typical prediction error in the same units as your data. The slope tells you the rate of change: for every 1-unit increase in x, y changes by β₁ units on average. These numbers together give you a complete picture of the relationship.

Beyond Simple Regression

Simple linear regression with one predictor is just the beginning. Multiple regression adds more predictors (y = β₀ + β₁x₁ + β₂x₂ + ...). Polynomial regression fits curves by adding squared and cubed terms. Regularized regression (Ridge, Lasso) prevents overfitting. All of these extensions build on the same least squares foundation you can explore in the simulation above.

FAQ

What is linear regression?

Linear regression finds the straight line that best fits a set of data points. It minimizes the sum of squared vertical distances (residuals) between the observed data and the predicted values. The result is the 'least squares' line: ŷ = β₀ + β₁x.

What does R² (R-squared) tell you?

R² measures the proportion of variance in the dependent variable explained by the model. R²=1 means a perfect fit, R²=0 means the model explains nothing. It ranges from 0 to 1 for simple linear regression but can be negative for poorly specified models.

When does linear regression fail?

Linear regression fails when the relationship is nonlinear, when outliers dominate the fit, when residuals are heteroscedastic (non-constant variance), or when predictors are highly correlated (multicollinearity). Always plot your data before fitting a line.

What is the difference between correlation and regression?

Correlation (r) measures the strength and direction of a linear relationship symmetrically. Regression goes further: it estimates a predictive model (ŷ = β₀ + β₁x) with a specific direction — predicting y from x. The relationship is: R² = r² for simple linear regression.

Sources

Embed

<iframe src="https://homo-deus.com/lab/data-science/linear-regression/embed" width="100%" height="400" frameborder="0"></iframe>
View source on GitHub