Live Project Feb 2026

State Space Models & Mamba

A deep dive deconstructing State Space Models (SSMs). Rooted in Control Theory, SSMs offer linear scaling and infinite context potential compared to traditional Transformers. This series covers the fundamental math, writing a 1D State Space layer in PyTorch, and scaling it against baseline Transformers.

Development Status Completed
Live Project Feb 2026

Kolmogorov-Arnold Networks (KANs)

An exploration of Kolmogorov-Arnold Networks (KANs), replacing fixed node activations with learnable B-splines on the edges. This series covers the mathematics of the Kolmogorov-Arnold representation theorem, a pure PyTorch implementation, and benchmarks scaling KANs versus MLPs for symbolic regression.

Development Status Completed
Live Project Mar 2026

Liquid Neural Networks (LNNs)

A deep dive deconstructing Liquid Neural Networks (LNNs). Moving from discrete recurrent networks to continuous-time differential equations, building Liquid Time-Constant models in PyTorch, and benchmarking LNNs against traditional LSTMs for noisy time-series tasks.

Development Status Completed
Live Project Mar 2026

Spiking Neural Networks (SNNs)

A deep dive deconstructing Spiking Neural Networks (SNNs). From biological action potentials and the Leaky Integrate-and-Fire model, to surrogate-gradient training in pure PyTorch, to benchmarking against standard ANNs—achieving the same MNIST accuracy at 51.7% lower energy cost.

Development Status Completed
Live Project Mar 2026

Diffusion Models

A deep dive deconstructing Diffusion Models. Building a custom U-Net to learn the forward noise corruption and reverse denoising process. We implement a Denoising Diffusion Probabilistic Model (DDPM) entirely from scratch in PyTorch, successfully generating convincing digits from isotropic Gaussian noise.

Development Status Completed
Live Project Mar 2026

Predictive Coding Networks (PCNs)

A deep dive deconstructing Predictive Coding Networks. Replacing global backpropagation with local energy minimization and Hebbian weight updates. We implement a biologically plausible PCN entirely from scratch in PyTorch, benchmarking it against standard MLPs on regression and MNIST classification.

Development Status Completed
Live Project Mar 2026

Echo State Networks (ESNs)

An exploration of Reservoir Computing through Echo State Networks (ESNs). We bypass Backpropagation Through Time (BPTT) completely by generating a massive, sparse, frozen recurrent structure and training only the final linear readout via closed-form Ridge Regression, achieving convergence in milliseconds.

Development Status Completed
Live Project Mar 2026

RWKV: Linear Attention RNN

A deep dive deconstructing RWKV (Receptance Weighted Key Value)—an architecture that trains in parallel like a Transformer but infers sequentially like an RNN. From linear attention theory and the WKV operator, to a pure PyTorch implementation, to benchmarks showing 3.2× faster inference and 5.3× less memory than Transformers.

Development Status Completed
Live Project Apr 2026

Hopfield Networks from Scratch

A deep dive deconstructing Hopfield Networks. From classical associative memory and Hebbian learning, to modern continuous Hopfield networks with exponential storage capacity, to proving that the modern update rule is mathematically identical to Transformer attention.

Development Status Completed
Live Project Apr 2026

Neural ODEs from Scratch

A deep dive deconstructing Neural ODEs. From the mathematics of continuous-depth networks and ODE solvers, to building Euler and RK4 integrators from scratch in PyTorch, to classifying spirals with 19.3x fewer parameters than a discrete ResNet.

Development Status Completed
Live Project Apr 2026

Normalizing Flows from Scratch

A deep dive deconstructing Normalizing Flows. From the change of variables formula and Jacobian determinants, to building Planar Flows and RealNVP with affine coupling layers from scratch in PyTorch, to exact likelihood density estimation on 2D distributions.

Development Status Completed
Live Project Apr 2026

MoE from Scratch

A deep dive deconstructing Mixture of Experts. From the mathematics of conditional computation and Top-K gating, to building expert networks with load-balancing loss from scratch in PyTorch, to showing that 8 experts with only 2 active per input match dense model accuracy at 54% sparsity.

Development Status Completed
Live Project May 2026

LoRA from Scratch

Low-rank adaptation in 50 lines of PyTorch. Fine-tuning's weight delta has low intrinsic rank — LoRA factorises it as BA with rank r ≪ d. On a moons-rotation adaptation task, rank-2 LoRA matches full fine-tuning with 4.59% of the trainable parameters. The technique that turned 70B-parameter fine-tuning into a consumer-laptop activity.

Development Status Completed
Live Project May 2026

CLIP from Scratch

Contrastive image-text pretraining in 80 lines of PyTorch. Two encoders, a shared embedding space, a single symmetric InfoNCE loss — no labels, no classifier head. A 75K-parameter from-scratch CLIP reaches 100% zero-shot on 16-way colored-shape classification in 14 seconds, with the embedding space organising itself by (color, shape) without supervision on either concept.

Development Status Completed
Live Project May 2026

NeRF Kernel from Scratch

The single trick that made NeRF (and SIREN, Instant-NGP, every neural-field method since) work: Fourier feature encoding of input coordinates. Same MLP, same training, same parameter budget — the encoded variant fits a 64×64 image at +10.64 dB PSNR over the unencoded one. The architecture was never the bottleneck; spectral bias was.

Development Status Completed
Live Project May 2026

DiT from Scratch

The Diffusion Transformer (Peebles & Xie, 2023) — the architecture that replaced the U-Net in Stable Diffusion 3 and Sora. Built from scratch in 250 lines of PyTorch with adaLN-Zero conditioning, trained class-conditionally on 16x16 colored shapes. The honest result: undertrained at toy scale, exactly as the inductive-bias tradeoff predicts.

Development Status Completed
Live Project May 2026

Neural ODE on Biological Dynamics

Neural ODEs applied to the FitzHugh-Nagumo neuron model — the canonical 2D simplification of Hodgkin-Huxley excitability. The capstone of the series: phase-portrait recovery as a more honest diagnostic than trajectory MSE. The Neural ODE recovers the true vector field with cosine similarity 0.94–0.96; a parameter-matched discrete RNN fits the data equally but does not learn the physics.

Development Status Completed