Deconstructing Autoencoders: Part 3 - What the Bottleneck Learns

Introduction

In Part 1, we derived the mathematics of compression---the information bottleneck, MSE reconstruction loss, and the manifold hypothesis. In Part 2, we built four autoencoder variants from scratch in pure PyTorch: Vanilla, Denoising, Sparse, and Convolutional.

Today, we put all four to the test. We train on a 5,000-image MNIST subset, measure reconstruction quality, visualize the 2D latent space, and demonstrate denoising. The goal is not state-of-the-art performance but understanding: what does the bottleneck actually learn?

Experimental Setup

To make the experiment reproducible on any standard CPU in under 2 minutes, we configured:

Dataset: 5,000 randomly sampled MNIST training images
Epochs: 30
Optimizer: Adam (lr = $10^{-3}$)
Batch size: 128
Latent dimensions: $d=2$ (visualization) and $d=32$ (quality comparison)

Training Results

All five configurations converged smoothly over 30 epochs:

Variant	Latent Dim	Final Loss
Vanilla	2	0.0401
Vanilla	32	0.0125
Denoising	32	0.0191
Sparse (MSE+L1)	32	0.0130
Convolutional	32	0.0083

Analysis of the Loss Numbers

Several patterns emerge from the results:

Bottleneck width matters: Vanilla $d=2$ (0.0401) vs. vanilla $d=32$ (0.0125) --- a 16$\times$ wider bottleneck reduces reconstruction error by 3.2$\times$. This confirms that MNIST digits have more than 2 intrinsic dimensions of variation, but 32 dimensions capture most of the structure.
Denoising costs extra: The denoising autoencoder (0.0191) has higher loss than vanilla (0.0125) because it trains on corrupted inputs. The network must simultaneously denoise and reconstruct, which is a harder task. However, this extra difficulty produces more robust representations.
Sparsity is nearly free: Sparse (0.0130) vs. vanilla (0.0125) --- the L1 penalty ($\lambda = 10^{-3}$) barely increases reconstruction error while enforcing a structured, sparse latent code.
Convolutions win: The convolutional autoencoder (0.0083) achieves the lowest loss by a significant margin---34% lower than vanilla $d=32$. Preserving spatial structure via Conv2d/ConvTranspose2d gives the network a massive advantage on image data.

2D Latent Space Visualization

The most revealing visualization is the 2D latent space of the vanilla autoencoder trained with $d=2$. By encoding all 10,000 MNIST test images and plotting them as a scatter plot colored by digit class, we can directly observe the structure the bottleneck has learned.

What the Scatter Plot Reveals

Unsupervised clustering: The network was never given digit labels, yet it organizes the 10 classes into clearly separated regions. The bottleneck pressure alone forces semantically similar inputs to cluster together.
Similarity structure: Visually similar digits are placed near each other in latent space. The digits 4 and 9 (which share a vertical stroke and a loop/angle at the top) cluster near each other, as do 3 and 5 (similar upper curves). The digit 1 (a simple vertical stroke) occupies its own distinct region.
Continuity: Traversing a straight line between two digit clusters in latent space produces smooth interpolations---the decoder generates images that gradually morph from one digit to another. This confirms that the latent space is smooth, not fragmented.
Manifold structure: The fact that 10,000 images organize into a structured 2D landscape confirms the manifold hypothesis: the "true" dimensionality of handwritten digits is far lower than 784.

Denoising Quality

The denoising autoencoder demonstrates a striking capability: given heavily corrupted inputs (Gaussian noise, $\sigma = 0.3$), it recovers clean, recognizable digits.

Why Denoising Works

The bottleneck acts as a noise filter. Noise is high-frequency, random, and uncorrelated---it cannot be efficiently encoded in a 32-dimensional latent space. The true signal (digit structure) is low-frequency, patterned, and compressible. By forcing the noisy input through the bottleneck, only the compressible signal survives.

This is precisely the mechanism described by Vincent et al. (2008): the denoising autoencoder learns to project corrupted inputs back onto the data manifold. The reconstruction is not a denoised version of the specific noisy input---it is the nearest point on the learned manifold.

Reconstruction Comparison

Comparing reconstructions across all four variants reveals the qualitative differences in what each architecture captures.

Observations

Vanilla: Produces slightly blurred reconstructions. Fine details (thin strokes, sharp corners) are softened. This is characteristic of MSE-trained autoencoders, which average over uncertainty in pixel values.
Denoising: Similar quality to vanilla despite training on corrupted inputs. The regularization from noise does not significantly hurt reconstruction quality at $d=32$, while making the representation more robust.
Sparse: Nearly identical to vanilla. The L1 penalty ($\lambda = 10^{-3}$) is mild enough that reconstruction quality is preserved. The difference is in the latent code: sparse activations rather than dense ones.
Convolutional: The sharpest reconstructions by far. Thin strokes remain thin, curves remain smooth, and fine details are better preserved. The Conv2d/ConvTranspose2d architecture encodes spatial relationships directly, avoiding the information loss from flattening.

Connections to Modern Architectures

Autoencoders may be decades old, but their core ideas permeate modern deep learning:

Variational Autoencoders (VAEs): Replace the deterministic bottleneck with a probabilistic one---the encoder outputs a mean and variance, and the latent code is sampled from a Gaussian. This enables generation of new samples by sampling from the latent space.
Diffusion Models: The denoising autoencoder's principle---learn to reverse corruption---is the foundation of denoising diffusion probabilistic models (DDPMs). Modern diffusion models like Stable Diffusion apply this iteratively across multiple noise levels.
Sparse Autoencoders in Mechanistic Interpretability: Recent work by Anthropic and others uses sparse autoencoders to decompose the internal representations of large language models into interpretable features. The same L1-penalized bottleneck we implemented is being used to understand what Transformer neurons encode.
Latent Diffusion: Stable Diffusion uses a convolutional autoencoder (specifically, a VQ-VAE) to compress images into a latent space, then runs the diffusion process in that compressed space. The autoencoder's job is exactly what we studied: compress images into a compact, meaningful representation.

Conclusion

We derived the math, built the code, and validated the results. Four autoencoder variants, trained from scratch on 5,000 MNIST images, demonstrate the fundamental principle: forcing data through a bottleneck reveals its structure.

The convolutional autoencoder achieved the best reconstruction (MSE = 0.0083) by preserving spatial structure. The 2D latent space scatter showed unsupervised digit clustering. The denoising autoencoder successfully stripped heavy Gaussian noise from corrupted inputs. And the sparse autoencoder maintained reconstruction quality while enforcing a structured latent code.

These are not just historical curiosities. The information bottleneck, the manifold hypothesis, and the reconstruction objective are alive in every modern generative model. Understanding autoencoders from first principles is understanding the foundation of representation learning.

Thank you for following this 3-part "Build in Public" series on Autoencoders. The full code, training logs, and visualizations are live on the GitHub repo. Run the 2-minute training script yourself to watch the bottleneck organize digits in real time!

Deconstructing Autoencoders from Scratch

Part 3: What the Bottleneck Learns