Back to GNNs Hub

Deconstructing GNNs from Scratch

Part 3: Classifying Nodes in Graphs

Abstract

We evaluate the GCN and GAT architectures from Parts 1--2 on the classic Zachary's Karate Club benchmark. Using only 4 labeled nodes out of 34 (semi-supervised), both models achieve 100% node classification accuracy. We analyze training dynamics, convergence behavior, and what the graph visualizations reveal about how message passing propagates labels through the network.

The Benchmark: Zachary's Karate Club

Wayne Zachary's 1977 study documented the social network of a university karate club that split into two factions after a dispute between the instructor (Mr. Hi, node 0) and the club officer (node 33).

Property Value
Nodes34
Undirected edges78
Classes2 (Mr. Hi vs. Officer)
Node featuresOne-hot identity (34-dim)
Class 0 (Mr. Hi)16 nodes
Class 1 (Officer)18 nodes

Semi-Supervised Setup

Following the standard semi-supervised protocol:

The challenge: can 4 labels, propagated through the graph structure, correctly classify all 34 nodes?

Experimental Configuration

Hyperparameter GCN GAT
Architecture34-16-234-[4x8=32]-2
ActivationReLUELU
Dropout0.50.6
Learning rate0.010.005
Epochs200200
Attention heads--4 (layer 1), 1 (output)

Results

Accuracy

Both models achieve perfect separation of the two factions using only 4 labeled training nodes. GCN: 100.00%. GAT: 100.00%. Zero misclassified nodes.

Training Dynamics

The convergence behavior reveals important differences:

GCN converges faster. At epoch 1, GCN already classifies 79.4% of nodes correctly. The symmetric normalization provides an immediate structural prior: nodes that share many neighbors with a labeled node receive a strong signal from the very first forward pass. By epoch 50, both loss and accuracy have essentially plateaued.

GAT starts slower but has lower initial loss. GAT's accuracy at epoch 1 is only 64.7%, but its loss (0.648) is lower than GCN's (0.700). This apparent contradiction arises because GAT's attention weights are initially near-uniform, producing well-calibrated but indecisive predictions. As attention sharpens, accuracy jumps rapidly.

GAT's higher final loss. GAT's loss at epoch 200 (0.061) is 20$\times$ higher than GCN's (0.003). This is not a sign of worse performance -- both achieve 100% accuracy. The difference stems from GAT's stronger dropout (0.6 vs. 0.5) and the stochasticity of attention dropout, which prevents the loss from collapsing to near-zero during training mode.

Why Does Semi-Supervised Learning Work So Well?

Label Propagation via Message Passing

A 2-layer GNN gives each node access to its 2-hop neighborhood. In the Karate Club graph, the average shortest path length is approximately 2.4, meaning most nodes are within 2--3 hops of a labeled node.

The mechanism:

  1. Layer 1: Each labeled node's class signal spreads to its immediate neighbors.
  2. Layer 2: Those neighbors propagate the signal one hop further, reaching 2-hop neighborhoods.

With 4 strategically placed labels (nodes 3, 6, 18, 20), the 2-hop coverage encompasses nearly the entire graph.

Graph Structure as Implicit Regularization

Even without meaningful node features (we use one-hot identities), the adjacency matrix encodes community structure. The normalized propagation $\hat{\mathbf{D}}^{-1/2}\hat{\mathbf{A}}\hat{\mathbf{D}}^{-1/2}$ acts as a smoothing operator: connected nodes are pushed toward similar representations. Since the two factions are densely connected within groups and sparsely connected between groups, this smoothing naturally separates them.

The Homophily Assumption

Both GCN and GAT implicitly assume homophily -- that connected nodes tend to share labels. Zachary's Karate Club exhibits strong homophily (members befriend those in their faction), making it an ideal benchmark. On heterophilous graphs (where connected nodes tend to have different labels), standard GNNs struggle and require architectural modifications.

Limitations and Extensions

Conclusion

Building GCN and GAT from scratch in pure PyTorch reveals that graph neural networks are, at their core, remarkably simple:

  1. Normalize the adjacency matrix (GCN) or compute attention (GAT)
  2. Multiply by projected features
  3. Repeat for $L$ layers

On Zachary's Karate Club, this procedure achieves perfect accuracy with only 4 labeled nodes -- a testament to the power of exploiting graph structure in semi-supervised learning.

The key takeaways: