Back to RNNs Hub

RNNs from Scratch

Part 3: Training and Analyzing Dynamics

Introduction

After training our RNN on sequence classification, we can analyze its internal dynamics. How do hidden states evolve over time? What temporal patterns does the network capture?

Training Results

Our 2-layer RNN achieves:

The training curves show characteristic RNN behavior: initial rapid learning followed by gradual refinement.

Training Loss and Accuracy Curves
Figure 1: Training and validation metrics across epochs.

Visualizing Hidden State Dynamics

Hidden State Trajectories

Each hidden unit traces a trajectory over time:

Hidden States Visualizations
Figure 2: Trajectories of the hidden cell state activation values over time steps.

Temporal Integration

By examining hidden state plots, we observe:

The Vanishing Gradient Problem in Practice

Gradient Flow Analysis

During backpropagation, gradients must flow through all time steps:

$$ \frac{\partial L}{\partial W} = \sum_t \frac{\partial L}{\partial h_t} \cdot \frac{\partial h_t}{\partial W} $$

For long sequences, early time steps receive vanishingly small gradients.

Empirical Observation

We observe that:

Why LSTMs and GRUs Were Invented

The vanishing gradient problem motivated gated architectures:

RNNs vs Transformers

Computational Complexity

Memory Characteristics

Use Cases

Conclusion

RNNs provide an elegant framework for sequence modeling through recurrence. While Transformers dominate many tasks, RNNs remain valuable for streaming applications and as building blocks for more complex architectures like LSTMs.