Back to Projects

Deconstructing Optimizers

Five optimizers (SGD, Momentum, Adam, AdamW, Lion) as one-line modifications of the same skeleton — implemented from scratch and benchmarked head-to-head.

Part 1

The Family Tree

How each optimizer relates to its predecessor — bias correction in Adam, decoupled weight decay in AdamW, sign-of-momentum in Lion.

Part 2

Fifteen Lines Each

Minimal PyTorch implementations using only in-place ops. No torch.optim inheritance — just .step() and .zero_grad().
View Code on GitHub

Part 3

Head-to-Head

Momentum wins Rosenbrock by 10⁶. Adam wins Beale by 10×. On real MLP training, the Adam-family methods converge to within 0.2 points.