Back to RNNs Hub

RNNs from Scratch

Part 2: Building the Architecture

Introduction

In Part 1, we explored the mathematics of recurrence. Now we implement RNNs in PyTorch, building from individual cells to complete sequence models.

The RNN Cell

Our RNNCell class implements the core recurrence:

rnn_cell.py
h_new = torch.tanh(W_xh @ x + W_hh @ h_prev + b)

Key design choices:

Multi-Layer RNNs

Stacking RNN cells creates deeper representations:

Architecture Variants

Sequence Classifier

Many-to-one architecture:

Architecture
Input:  (batch, seq_len, input_size)
Output: (batch, num_classes)

Uses the final hidden state for classification.

Sequence Tagger

Many-to-many architecture:

Architecture
Input:  (batch, seq_len, input_size)
Output: (batch, seq_len, num_tags)

Outputs a prediction at each time step.

CharRNN

Character-level language model:

Bidirectional RNN

Processes sequences in both directions:

This captures context from both directions, crucial for tasks like named entity recognition.

Implementation Details

Batch-First vs Time-First

We use batch-first tensors (B, T, D) for compatibility with PyTorch conventions, but internally process time step by time step.

Dropout Regularization

Applied between RNN layers (not across time) to prevent overfitting:

rnn.py
if dropout > 0:
    self.dropout = nn.Dropout(dropout)

Hidden State Initialization

Hidden states are initialized to zero at the start of each sequence:

$$ h_0 = \mathbf{0} $$

Training Strategy

We train on synthetic sequence tasks:

Conclusion

With our RNN implemented, we can now train it and visualize how hidden states evolve. Part 3 explores hidden state dynamics and analyzes what the network learns.