Back to Projects

Deconstructing Backpropagation from Scratch

PyTorch's autograd is sixty thousand lines of C++ and Python. The algorithm it implements is ten lines — built here entirely from scratch in pure NumPy.

Part 1

The Math of Reverse-Mode Autodiff

Breaking down the chain rule, computation graphs, why reverse mode is the only viable choice when parameters outnumber loss dimensions by ten orders of magnitude, and the role of broadcasting.

Part 2

Pure NumPy Implementation

Building the Tensor class, elementary ops with closures, a numerically-stable log-softmax, and the ten-line backward engine — all in 245 lines of NumPy.
View Code on GitHub

Part 3

Cross-Checking Against PyTorch

Gradients match PyTorch to 3.73e-17 across every parameter on a two-moons MLP. Same final accuracy. And the surprising finding: NumPy trains 4x faster than PyTorch on small-batch problems.