Neural ODEs on Biological Dynamics: Part 3 - Phase-Portrait Recovery

The Headline

Two models. Same architecture. Same parameter count. Same training data. Both fit the FitzHugh-Nagumo trajectories to MSE $\sim 0.02$. Only one of them learned the physics.

This part runs the experiment and presents the result: trajectory accuracy is a noisy proxy for whether a model has learned the underlying dynamical system. The proper diagnostic is phase-portrait recovery — comparing the model's learned vector field against the ground-truth flow, arrow by arrow. By that metric, the Neural ODE wins decisively; the discrete RNN does not.

Recap of the Experiment

Dataset. $40$ FHN trajectories, $80$ time-steps each at $\Delta t = 0.5$. Half spiking ($I = 0.5$), half quiescent ($I = 0.2$). Random initial conditions in the $(v, w)$ plane.

Models. Two models with identical architecture: a 4-layer MLP (state + I in $\to$ tanh $\to$ 128 $\to$ tanh $\to$ 128 $\to$ state out), $17{,}282$ parameters. The Neural ODE wraps it in RK4 integration; the discrete RNN wraps it in residual rollout.

Training. AdamW at $\eta = 3 \times 10^{-3}$ with cosine schedule, $600$ epochs, gradient clipping at $1.0$. Trajectory MSE loss across the full $80$ steps.

Tests. In-distribution MSE on the training-length trajectories, extrapolation MSE on $4\times$ longer trajectories, and phase-portrait recovery (cosine similarity of the learned vector field against ground truth).

Trajectory-Fit Results

Model	Params	Final MSE	Extrapolation MSE	Train time
Neural ODE	$17{,}282$	$0.02072$	$0.02014$	$40.5$ s
Discrete RNN	$17{,}282$	$0.01547$	$0.01670$	$8.9$ s

Two things to notice. First: in-distribution, the discrete RNN is marginally better and trains $4.5\times$ faster. Second: the extrapolation MSE (on trajectories $4\times$ longer than what either model trained on) stays close to the in-distribution MSE for both models. Neither degrades dramatically when extrapolated.

The non-degradation on extrapolation is itself a surprising and important result, and it depends critically on the autonomous-dynamics inductive bias from Part 2. If we had kept $t$ in the Neural ODE input, extrapolation MSE would have been $\sim 5.0$ instead of $0.02$ — a factor of $250$ from one design choice. The autonomous-dynamics choice is essential for any model that needs to operate at different time horizons than it trained on.

If you stopped reading at this table, you would conclude that the discrete RNN is the better choice — it trains $4.5\times$ faster, has slightly lower trajectory MSE, and extrapolates fine. The Neural ODE looks like a slow, unnecessary luxury.

That conclusion would be wrong. Trajectory MSE is the wrong metric. To see why, we need to look at the phase portraits.

The Phase-Portrait Diagnostic

Trajectory MSE measures whether a model produces the same time series as the data. It does not measure whether the model has learned the underlying physics. A model can fit individual trajectories without learning the underlying flow — for example, by overfitting to the specific initial conditions in the training set, or by learning a discrete-step approximation that happens to coincide with the data's sampling rate.

The proper diagnostic is to query the model's predicted derivative $f(y, I)$ at a grid of $(v, w)$ points and compare against the ground-truth FHN field, arrow by arrow. If the model has learned the true dynamics, its predicted arrows should point in the same direction as the ground-truth arrows at every point on the grid.

Neural ODE phase portrait

Regime	Mean cosine similarity with $f_\text{FHN}$
Quiescent ($I = 0.2$)	$0.94$
Spiking ($I = 0.5$)	$0.96$

The Neural ODE recovers the direction of the FHN flow with cosine similarity above $0.94$ in both regimes. Visually, the learned arrows on the $(v, w)$ plane match the ground-truth arrows — same direction at the same point, especially in the regions covered by the training trajectories. Outside the training-trajectory region the direction is still mostly correct but the magnitude is less reliable (the model has less data to estimate the flow strength there).

This is the kind of result that justifies the architectural choice. The Neural ODE has not just memorised the data — it has discovered the structure of the dynamical system. You could use the learned $f$ to predict behaviour from new initial conditions, integrate at different step sizes, or do anything else that depends on having the right flow.

Discrete RNN phase portrait

The discrete RNN does not have an explicit $f$. It has a residual update $\Delta y = g(y, I)$ that returns the time-integrated effect of the dynamics over one $\Delta t$. We can try to recover an implicit "instantaneous flow" by dividing by $\Delta t$:

\tilde f(y, I) \approx \frac{g(y, I)}{\Delta t}.

With this normalisation, the RNN's recovered field has good direction (cosine $\sim 0.9$) but its magnitude is wrong unless we know $\Delta t$. And we only know $\Delta t$ because we set it when generating the training data. In a real-world scenario where the data comes from an instrument with its own sampling rate (calcium imaging at $30$ Hz, patch-clamp at $10$ kHz), this division is a manual correction the user has to apply, not something the model knows.

So the RNN's recovered "vector field" is correct in direction but wrong in magnitude unless you have external knowledge of the training $\Delta t$. The Neural ODE's recovered vector field is correct in both direction and magnitude, intrinsically — because the model parameterises the instantaneous derivative directly, not its time-integrated effect.

The Asymmetry

The asymmetry is this: the Neural ODE knows about time explicitly. The RNN does not.

This is invisible in the fitted training trajectories at the original $\Delta t$. Both models predict the next state correctly. But as soon as you want to do anything other than predict at the training time grid, the asymmetry becomes visible:

Change the sampling rate. Want predictions every $0.25$ time units instead of every $0.5$? The Neural ODE does this by integrating with a smaller step size — same model, different integrator call. The RNN cannot do this; it takes one conceptual step per rollout() iteration, regardless of $\Delta t$. To get half-step predictions you would have to retrain.

Interpolate between samples. Want to know what the system was doing at $t = 5.25$, between two sampled points? The Neural ODE produces this naturally by integrating up to $t = 5.25$. The RNN has nothing to say about $t$ values that aren't a multiple of the training $\Delta t$.

Compare with another data source at a different rate. Your calcium imaging is at $30$ Hz; a collaborator's voltage imaging is at $200$ Hz. A Neural ODE trained on one can be evaluated at the other's rate by changing $t_\text{eval}$. A discrete RNN cannot bridge the two.

Run the model backward in time. Useful for understanding which initial conditions could have produced an observed state. The Neural ODE can integrate backward; the RNN can only roll forward.

Why This Matters for Biology

The discussion of discrete-vs-continuous looks abstract, but it has concrete implications for biological applications of these models.

Biological time-series data comes at the resolution your instrument samples. None of these rates are the underlying timescale of the dynamics; they are sampling rates of the instruments:

Calcium imaging: $30$ Hz (sometimes faster on modern setups).
Voltage imaging: $\sim 500$ Hz.
Patch-clamp electrophysiology: $10$–$50$ kHz.
EEG / MEG: $128$–$1024$ Hz.
Single-cell RNA-seq: one timepoint per several hours.
Live-cell fluorescence imaging: variable, often $1$ Hz or slower.

A discrete RNN trained on patch-clamp at $10$ kHz produces a model that runs at $10$ kHz steps. The same RNN cannot be evaluated at $50$ kHz or interpolated between samples. If you collect a new dataset at a different sampling rate, you re-train.

A Neural ODE trained on the same data lives in continuous time. The same model can predict at any rate, can interpolate between samples (by integrating with smaller steps), and can in principle be combined with other Neural ODEs that live in the same continuous space. This is the "model of the system" property, as opposed to the "parameterised interpolator" property.

For PhD work on cellular signalling, neural excitability, or any other biological dynamics, this distinction is the actual reason to use Neural ODEs over discrete RNNs. It is not about fit quality — both methods can fit similar curves with similar accuracy. It is about what the model represents.

The Headline Reframe

Time-series accuracy and physical fidelity are not the same metric.

A model that fits the data exactly but has learned a discrete-step approximation of the dynamics gives you predicted trajectories at the training rate. A model that learns the continuous-time vector field gives you a model of the system. They are different scientific objects, and only the second one lets you reason about the underlying physics.

For data that lives in continuous time — which is almost everything in biology — the difference matters. The Neural ODE produces a model of the dynamical system; the RNN produces a parameterised interpolator. Both can fit the data; only the first answers the scientific questions.

What Transfers to Other Biological Systems

The same Neural ODE architecture works for any system of ODEs you can write down. For biological systems specifically:

Hodgkin-Huxley. $4$D system ($V$, $m$, $h$, $n$). Same approach as our 2D FHN, with a larger MLP for the additional state dimensions.

Repressilator. $3$D gene regulatory oscillator (Elowitz & Leibler, 2000). Three mRNA concentrations cyclically repressing each other. Natural test case for the Neural ODE framework.

Lotka-Volterra. $2$D predator-prey dynamics. The textbook example for nonlinear oscillators in ecology. Trivial extension of the FHN code.

Glycolytic oscillator. $7$D metabolic model with multi-timescale dynamics. Now the Neural ODE's parameter efficiency really helps: one shared network for all $7$ state variables, learning all the coupling between them.

SIR / SEIR epidemic models. $3$D or $4$D compartmental models. Useful for understanding disease dynamics; the Neural ODE can be fit to case-count time series.

For complex biological systems where the true equations are unknown, the Neural ODE can be trained directly from data and produces a continuous-time model that respects the same inductive biases as the underlying physics. This is the application that motivates the entire framework.

What Does Not Transfer

Neural ODEs are not universally the right tool. Several failure modes:

Stiff dynamics. Systems where some variables evolve much faster than others (microsecond Na+ channels vs millisecond synaptic dynamics) are computationally expensive for explicit integrators like RK4 — the step size has to be small enough to resolve the fastest variable, making the whole integration slow. For stiff systems, implicit integrators are needed, which is more complex than what we implemented.

Stochastic dynamics. Many biological systems are inherently noisy (channel noise, molecular fluctuations, demographic stochasticity). The Neural ODE framework assumes deterministic dynamics; stochastic extensions (Neural SDEs) exist but are harder to train.

Delay-differential equations. Some biological systems depend on past values of the state, not just the current state. Standard Neural ODEs cannot represent this without state augmentation, which can be unstable.

For most "Hodgkin-Huxley-like" excitable systems and most metabolic / gene-regulatory networks, the standard Neural ODE works. For corner cases, more specialised tools (Neural SDE, Neural CDE, etc.) are available.

Summary

Both Neural ODE and a parameter-matched discrete RNN fit FHN trajectories to MSE $\sim 0.02$.
The Neural ODE recovers the FHN vector field with mean cosine similarity $> 0.94$ in both spiking and quiescent regimes.
The RNN's recovered "vector field" is only correct after dividing by the training $\Delta t$ — which the user has to know externally.
For continuous-time biological data, the Neural ODE is the more honest object: it lives in the same space the data does.
Time-series accuracy and physical fidelity are different metrics. A model can fit the data exactly without learning the underlying dynamical system.

Closing

The series ends here. The whole back catalog of architectures — from backprop in the first series to DiT in series 9 — exists to support exactly this kind of applied modelling on real scientific data. Spiking neurons today; signalling networks, cardiac dynamics, gene regulation, immunology, and pharmacokinetics tomorrow.

The Neural ODE framework is not a magic bullet — it is a deliberate inductive bias for continuous-time dynamics. When your data lives in continuous time (almost everything in biology), it is the right choice. When you actually need a discrete-time recurrent model (text, code, action sequences), use a Transformer or an RNN. The point is to match the inductive bias to the data.

Full code on GitHub: github.com/soveshmohapatra/Neural-ODE-Bio