NeRF is a SIREN that learned to render. The geometry is not in the architecture — it is in the loss. The pivotal innovation behind every neural-field method since NeRF is a single trick: Fourier feature encoding of the input coordinates.
The Spectral Bias of ReLU MLPs
Tancik et al. (2020) showed that the Neural Tangent Kernel of a ReLU MLP decays with frequency. High-frequency components of the target function converge much more slowly than low-frequency ones — a ReLU MLP cannot fit sharp edges, fine textures, or periodic structure from coordinate inputs at realistic budgets. This is what doomed coordinate-based scene representations before NeRF.
Fourier Feature Encoding
The fix: lift the coordinate $\mathbf{p}$ into a higher-dimensional Fourier space:
The MLP now operates on $\gamma(\mathbf{p}) \in \mathbb{R}^{2 L \, \dim(\mathbf{p})}$ instead of $\mathbf{p}$. The MLP no longer manufactures high frequencies — they are already in the input.
Why Exponentially-Spaced Frequencies?
Natural signals have approximately self-similar spectra at different scales. Linear-spaced frequencies waste capacity on one scale. Exponential spacing covers many decades with few bins. $L = 6$ covers $1, 2, 4, 8, 16, 32$ cycles per unit input.
Why Both Sin AND Cos?
$\sin(\omega p)$ and $\cos(\omega p)$ together span every phase at frequency $\omega$. With only $\sin$ the MLP would be confined to a fixed phase the encoder cannot adjust (the encoder has no trainable parameters).
NeRF in Three Equations
Let $\mathbf{r}(t) = \mathbf{o} + t \mathbf{d}$ be a ray. Sample $N$ points and predict color + density:
Volume render via the alpha-compositing integral:
Per-pixel MSE loss between $C(\mathbf{r})$ and ground truth. Train end-to-end.
What This Series Demonstrates
Implementing full 3D NeRF (multi-view poses, ray sampling, volume rendering) is roughly 500-800 lines of code with hours of GPU training. What we demonstrate is the kernel: a coordinate MLP plus Fourier features, fitting a high-frequency 2D image. This captures the same numerical phenomenon (spectral bias defeated by Fourier features) at a scale that trains in 6 seconds.