Deconstructing Autoencoders: Part 3 - What the Bottleneck Learns

Introduction

Part 1 covered the math; Part 2 built four autoencoder variants in PyTorch. Now we train all four on a 5,000-image MNIST subset, measure reconstruction loss, visualize the 2D latent space, and test denoising. The point is not SOTA numbers---it is understanding what the bottleneck actually learns.

Experimental Setup

Everything runs on a standard CPU in under 2 minutes:

Dataset: 5,000 randomly sampled MNIST training images
Epochs: 30
Optimizer: Adam (lr = $10^{-3}$)
Batch size: 128
Latent dimensions: $d=2$ (visualization) and $d=32$ (quality comparison)

Training Results

All five configurations converged smoothly over 30 epochs:

Variant	Latent Dim	Final Loss
Vanilla	2	0.0401
Vanilla	32	0.0125
Denoising	32	0.0191
Sparse (MSE+L1)	32	0.0130
Convolutional	32	0.0083

Analysis

Bottleneck width matters: Vanilla $d=2$ (0.0401) vs. $d=32$ (0.0125)---a 16$\times$ wider bottleneck cuts reconstruction error by 3.2$\times$. MNIST digits clearly have more than 2 intrinsic dimensions, but 32 captures most of the structure.
Denoising costs extra: 0.0191 vs. vanilla's 0.0125. Training on corrupted inputs is a harder task, but it produces more robust representations.
Sparsity is nearly free: 0.0130 vs. 0.0125---the L1 penalty ($\lambda = 10^{-3}$) barely hurts reconstruction while enforcing a structured latent code.
Convolutions win: 0.0083, 34% lower than vanilla $d=32$. Preserving spatial structure via Conv2d/ConvTranspose2d is a large advantage on image data.

2D Latent Space Visualization

We encode all 10,000 MNIST test images with the vanilla $d=2$ autoencoder and plot them as a scatter, colored by digit class.

What the Scatter Plot Shows

Unsupervised clustering: The network never saw digit labels, yet it separates the 10 classes into distinct regions. Compression pressure alone forces semantic grouping.
Similarity structure: Visually similar digits land near each other---4 and 9 (shared vertical stroke, loop/angle at top), 3 and 5 (similar upper curves). The digit 1, a simple vertical stroke, sits in its own region.
Continuity: Walking a straight line between two clusters in latent space produces smooth interpolations. The decoder generates images that gradually morph from one digit to another.
Manifold confirmation: 10,000 images organizing into a structured 2D landscape directly supports the manifold hypothesis---the intrinsic dimensionality of handwritten digits is far below 784.

Denoising Quality

Given heavily corrupted inputs (Gaussian noise, $\sigma = 0.3$), the denoising autoencoder recovers clean, recognizable digits.

Why Denoising Works

The bottleneck acts as a noise filter. Noise is high-frequency, random, and uncorrelated---it cannot be efficiently packed into 32 dimensions. Digit structure is low-frequency, patterned, and compressible. Forcing noisy inputs through the bottleneck strips out whatever is not compressible.

Vincent et al. (2008) formalized this: the denoising autoencoder learns to project corrupted inputs back onto the data manifold. The output is not a denoised version of the specific noisy input---it is the nearest point on the learned manifold.

Reconstruction Comparison

Comparing reconstructions across all four variants reveals the qualitative differences in what each architecture captures.

Observations

Vanilla: Slightly blurred. Thin strokes and sharp corners get softened---typical of MSE-trained autoencoders, which average over pixel uncertainty.
Denoising: Similar quality to vanilla despite training on corrupted inputs. At $d=32$, the noise regularization does not visibly hurt reconstruction.
Sparse: Nearly identical to vanilla. The L1 penalty at $\lambda = 10^{-3}$ is mild enough to preserve quality; the difference is in the latent code (sparse activations vs. dense).
Convolutional: Sharpest reconstructions. Thin strokes stay thin, curves stay smooth. Conv2d/ConvTranspose2d encodes spatial relationships directly, avoiding the information loss of flattening.

Where These Ideas Show Up Now

Variational Autoencoders (VAEs): Replace the deterministic bottleneck with a probabilistic one---the encoder outputs mean and variance, the latent code is sampled from a Gaussian. This enables generating new samples by sampling the latent space.
Diffusion Models: The denoising principle---learn to reverse corruption---is the foundation of DDPMs. Stable Diffusion applies it iteratively across multiple noise levels.
Sparse Autoencoders for Interpretability: Anthropic and others use sparse autoencoders to decompose LLM internal representations into interpretable features. Same L1-penalized bottleneck, applied to understanding what Transformer neurons encode.
Latent Diffusion: Stable Diffusion compresses images into a latent space via a convolutional autoencoder (VQ-VAE), then runs diffusion in that compressed space. The autoencoder's job is exactly what we studied here.

Conclusion

Four autoencoder variants, trained from scratch on 5,000 MNIST images, confirm the core principle: forcing data through a bottleneck reveals its structure.

The convolutional variant hit the lowest reconstruction loss (0.0083) by preserving spatial structure. The 2D latent space showed unsupervised digit clustering. The denoising variant stripped heavy Gaussian noise from corrupted inputs. The sparse variant maintained reconstruction quality (0.0130) while enforcing structured activations.

The information bottleneck, the manifold hypothesis, and the reconstruction objective are not historical curiosities---they are the foundation of every modern generative model. Full code and training logs are on GitHub.

Deconstructing Autoencoders from Scratch

Part 3: What the Bottleneck Learns