Back to Autoencoders Hub

Deconstructing Autoencoders from Scratch

Part 3: What the Bottleneck Learns

Introduction

In Part 1, we derived the mathematics of compression---the information bottleneck, MSE reconstruction loss, and the manifold hypothesis. In Part 2, we built four autoencoder variants from scratch in pure PyTorch: Vanilla, Denoising, Sparse, and Convolutional.

Today, we put all four to the test. We train on a 5,000-image MNIST subset, measure reconstruction quality, visualize the 2D latent space, and demonstrate denoising. The goal is not state-of-the-art performance but understanding: what does the bottleneck actually learn?

Experimental Setup

To make the experiment reproducible on any standard CPU in under 2 minutes, we configured:

Training Results

All five configurations converged smoothly over 30 epochs:

Variant Latent Dim Final Loss
Vanilla 2 0.0401
Vanilla 32 0.0125
Denoising 32 0.0191
Sparse (MSE+L1) 32 0.0130
Convolutional 32 0.0083

Analysis of the Loss Numbers

Several patterns emerge from the results:

  1. Bottleneck width matters: Vanilla $d=2$ (0.0401) vs. vanilla $d=32$ (0.0125) --- a 16$\times$ wider bottleneck reduces reconstruction error by 3.2$\times$. This confirms that MNIST digits have more than 2 intrinsic dimensions of variation, but 32 dimensions capture most of the structure.
  2. Denoising costs extra: The denoising autoencoder (0.0191) has higher loss than vanilla (0.0125) because it trains on corrupted inputs. The network must simultaneously denoise and reconstruct, which is a harder task. However, this extra difficulty produces more robust representations.
  3. Sparsity is nearly free: Sparse (0.0130) vs. vanilla (0.0125) --- the L1 penalty ($\lambda = 10^{-3}$) barely increases reconstruction error while enforcing a structured, sparse latent code.
  4. Convolutions win: The convolutional autoencoder (0.0083) achieves the lowest loss by a significant margin---34% lower than vanilla $d=32$. Preserving spatial structure via Conv2d/ConvTranspose2d gives the network a massive advantage on image data.

2D Latent Space Visualization

The most revealing visualization is the 2D latent space of the vanilla autoencoder trained with $d=2$. By encoding all 10,000 MNIST test images and plotting them as a scatter plot colored by digit class, we can directly observe the structure the bottleneck has learned.

What the Scatter Plot Reveals

Denoising Quality

The denoising autoencoder demonstrates a striking capability: given heavily corrupted inputs (Gaussian noise, $\sigma = 0.3$), it recovers clean, recognizable digits.

Why Denoising Works

The bottleneck acts as a noise filter. Noise is high-frequency, random, and uncorrelated---it cannot be efficiently encoded in a 32-dimensional latent space. The true signal (digit structure) is low-frequency, patterned, and compressible. By forcing the noisy input through the bottleneck, only the compressible signal survives.

This is precisely the mechanism described by Vincent et al. (2008): the denoising autoencoder learns to project corrupted inputs back onto the data manifold. The reconstruction is not a denoised version of the specific noisy input---it is the nearest point on the learned manifold.

Reconstruction Comparison

Comparing reconstructions across all four variants reveals the qualitative differences in what each architecture captures.

Observations

Connections to Modern Architectures

Autoencoders may be decades old, but their core ideas permeate modern deep learning:

  1. Variational Autoencoders (VAEs): Replace the deterministic bottleneck with a probabilistic one---the encoder outputs a mean and variance, and the latent code is sampled from a Gaussian. This enables generation of new samples by sampling from the latent space.
  2. Diffusion Models: The denoising autoencoder's principle---learn to reverse corruption---is the foundation of denoising diffusion probabilistic models (DDPMs). Modern diffusion models like Stable Diffusion apply this iteratively across multiple noise levels.
  3. Sparse Autoencoders in Mechanistic Interpretability: Recent work by Anthropic and others uses sparse autoencoders to decompose the internal representations of large language models into interpretable features. The same L1-penalized bottleneck we implemented is being used to understand what Transformer neurons encode.
  4. Latent Diffusion: Stable Diffusion uses a convolutional autoencoder (specifically, a VQ-VAE) to compress images into a latent space, then runs the diffusion process in that compressed space. The autoencoder's job is exactly what we studied: compress images into a compact, meaningful representation.

Conclusion

We derived the math, built the code, and validated the results. Four autoencoder variants, trained from scratch on 5,000 MNIST images, demonstrate the fundamental principle: forcing data through a bottleneck reveals its structure.

The convolutional autoencoder achieved the best reconstruction (MSE = 0.0083) by preserving spatial structure. The 2D latent space scatter showed unsupervised digit clustering. The denoising autoencoder successfully stripped heavy Gaussian noise from corrupted inputs. And the sparse autoencoder maintained reconstruction quality while enforcing a structured latent code.

These are not just historical curiosities. The information bottleneck, the manifold hypothesis, and the reconstruction objective are alive in every modern generative model. Understanding autoencoders from first principles is understanding the foundation of representation learning.

Thank you for following this 3-part "Build in Public" series on Autoencoders. The full code, training logs, and visualizations are live on the GitHub repo. Run the 2-minute training script yourself to watch the bottleneck organize digits in real time!