Deconstructing Optimizers
Five optimizers (SGD, Momentum, Adam, AdamW, Lion) as one-line modifications of the same skeleton — implemented from scratch and benchmarked head-to-head.
Part 1
The Family Tree
How each optimizer relates to its predecessor — bias correction in Adam, decoupled weight decay in AdamW, sign-of-momentum in Lion.
Part 2
Fifteen Lines Each
Minimal PyTorch implementations using only in-place ops. No torch.optim inheritance — just .step() and .zero_grad().
View Code on GitHub
Part 3
Head-to-Head
Momentum wins Rosenbrock by 10⁶. Adam wins Beale by 10×. On real MLP training, the Adam-family methods converge to within 0.2 points.