Live Project Feb 2026

State Space Models & Mamba

A deep dive deconstructing State Space Models (SSMs). Rooted in Control Theory, SSMs offer linear scaling and infinite context potential compared to traditional Transformers. This series covers the fundamental math, writing a 1D State Space layer in PyTorch, and scaling it against baseline Transformers.

Development Status Completed
Live Project Feb 2026

Kolmogorov-Arnold Networks (KANs)

An exploration of Kolmogorov-Arnold Networks (KANs), replacing fixed node activations with learnable B-splines on the edges. This series covers the mathematics of the Kolmogorov-Arnold representation theorem, a pure PyTorch implementation, and benchmarks scaling KANs versus MLPs for symbolic regression.

Development Status Completed
Live Project Mar 2026

Liquid Neural Networks (LNNs)

A deep dive deconstructing Liquid Neural Networks (LNNs). Moving from discrete recurrent networks to continuous-time differential equations, building Liquid Time-Constant models in PyTorch, and benchmarking LNNs against traditional LSTMs for noisy time-series tasks.

Development Status Completed
Live Project Mar 2026

Spiking Neural Networks (SNNs)

A deep dive deconstructing Spiking Neural Networks (SNNs). From biological action potentials and the Leaky Integrate-and-Fire model, to surrogate-gradient training in pure PyTorch, to benchmarking against standard ANNs—achieving the same MNIST accuracy at 51.7% lower energy cost.

Development Status Completed
Live Project Mar 2026

Diffusion Models

A deep dive deconstructing Diffusion Models. Building a custom U-Net to learn the forward noise corruption and reverse denoising process. We implement a Denoising Diffusion Probabilistic Model (DDPM) entirely from scratch in PyTorch, successfully generating convincing digits from isotropic Gaussian noise.

Development Status Completed
Live Project Mar 2026

Predictive Coding Networks (PCNs)

A deep dive deconstructing Predictive Coding Networks. Replacing global backpropagation with local energy minimization and Hebbian weight updates. We implement a biologically plausible PCN entirely from scratch in PyTorch, benchmarking it against standard MLPs on regression and MNIST classification.

Development Status Completed
Live Project Mar 2026

Echo State Networks (ESNs)

An exploration of Reservoir Computing through Echo State Networks (ESNs). We bypass Backpropagation Through Time (BPTT) completely by generating a massive, sparse, frozen recurrent structure and training only the final linear readout via closed-form Ridge Regression, achieving convergence in milliseconds.

Development Status Completed
Live Project Mar 2026

RWKV: Linear Attention RNN

A deep dive deconstructing RWKV (Receptance Weighted Key Value)—an architecture that trains in parallel like a Transformer but infers sequentially like an RNN. From linear attention theory and the WKV operator, to a pure PyTorch implementation, to benchmarks showing 3.2× faster inference and 5.3× less memory than Transformers.

Development Status Completed
Live Project Apr 2026

Hopfield Networks from Scratch

A deep dive deconstructing Hopfield Networks. From classical associative memory and Hebbian learning, to modern continuous Hopfield networks with exponential storage capacity, to proving that the modern update rule is mathematically identical to Transformer attention.

Development Status Completed
Live Project Apr 2026

Neural ODEs from Scratch

A deep dive deconstructing Neural ODEs. From the mathematics of continuous-depth networks and ODE solvers, to building Euler and RK4 integrators from scratch in PyTorch, to classifying spirals with 19.3x fewer parameters than a discrete ResNet.

Development Status Completed
Live Project Apr 2026

Normalizing Flows from Scratch

A deep dive deconstructing Normalizing Flows. From the change of variables formula and Jacobian determinants, to building Planar Flows and RealNVP with affine coupling layers from scratch in PyTorch, to exact likelihood density estimation on 2D distributions.

Development Status Completed
Live Project Apr 2026

MoE from Scratch

A deep dive deconstructing Mixture of Experts. From the mathematics of conditional computation and Top-K gating, to building expert networks with load-balancing loss from scratch in PyTorch, to showing that 8 experts with only 2 active per input match dense model accuracy at 54% sparsity.

Development Status Completed