arxiv: 2604.11178 · v1 · submitted 2026-04-13 · 🧬 q-bio.NC · cs.LG

Recognition: unknown

Probabilistic Prediction of Neural Dynamics via Autoregressive Flow Matching

Ahmed EL-Gazzar, Marcel van Gerven, Mario Senden, Nicole Rogalla, Yuzhen Qin

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:20 UTC · model grok-4.3

classification 🧬 q-bio.NC cs.LG

keywords neural dynamicsflow matchingautoregressive predictionfMRI forecastingBOLD signalsgenerative modelingprobabilistic forecastingbrain activity prediction

0 comments

The pith

A flow-matching model that conditions on recent neural history plus sensory input outperforms standard baselines at predicting short-term brain activity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to demonstrate that autoregressive flow matching can learn the conditional distribution of future neural activity from recent history and concurrent sensory inputs. A reader would care if this holds because it would enable more reliable probabilistic forecasts of brain responses to naturalistic stimuli. The work tests this idea on functional imaging recordings by pitting the full model against a non-autoregressive flow-matching version and a linear baseline, showing clear gains in accuracy and cortical coverage. Ablations further indicate that access to past activity is the largest contributor, while the autoregressive structure supplies modest additional improvements under short time horizons.

Core claim

By training a generative model with autoregressive flow matching to predict the conditional distribution of future neural activity given recent history and multimodal sensory input, the approach achieves superior performance in forecasting parcel-wise blood oxygenation level-dependent signals compared to non-autoregressive variants and linear baselines, with ablation studies indicating that past dynamics are the primary contributor to accuracy.

What carries the argument

Autoregressive flow matching, a transport-based generative technique that builds the prediction sequentially across time steps to capture the evolving conditional distribution of neural states.

If this is right

Probabilistic predictions of neural responses become feasible at scale from sensory inputs.
Improved accuracy and generalization appear in short-term forecasting of cortical activity.
Access to past neural states emerges as the dominant factor for prediction quality.
Autoregressive factorization supplies consistent gains in context-rich, short-horizon settings.
Flow-based generative modeling offers a viable path for short-term forecasting of brain dynamics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Such models could support real-time applications in adaptive neurotechnologies by generating likely future brain states on the fly.
Extensions to longer prediction horizons or different recording modalities would test whether the temporal conditioning remains effective.
The emphasis on history-dependent conditioning invites direct comparisons with predictive-coding accounts of cortical function.
Integration with other generative techniques might clarify when flow matching holds advantages over diffusion or autoregressive transformer alternatives.

Load-bearing premise

Neural activity can be modeled as a temporally evolving conditional process whose future states depend primarily on recent history and concurrent sensory input in a way that is learnable from the available fMRI recordings.

What would settle it

A direct test would be to evaluate the model on a new set of subjects or stimuli not used in training and check whether its prediction errors remain lower than those of the general linear model baseline; if errors are comparable or higher, the claimed advantage would not hold.

Figures

Figures reproduced from arXiv: 2604.11178 by Ahmed EL-Gazzar, Marcel van Gerven, Mario Senden, Nicole Rogalla, Yuzhen Qin.

**Figure 2.** Figure 2: Flatmap of mean prediction performance of AFM. A) Flatmap of parcel-wise Pearson’s correla [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 3.** Figure 3: Prediction performance of optimized individual AFM models with GRU encoder across Yeo [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: Ablation analyses of A) context window length and B) prediction window length for AFM and [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

read the original abstract

Forecasting neural activity in response to naturalistic stimuli remains a key challenge for understanding brain dynamics and enabling downstream neurotechnological applications. Here, we introduce a generative forecasting framework for modeling neural dynamics based on autoregressive flow matching (AFM). Building on recent advances in transport-based generative modeling, our approach probabilistically predicts neural responses at scale from multimodal sensory input. Specifically, we learn the conditional distribution of future neural activity given past neural dynamics and concurrent sensory input, explicitly modeling neural activity as a temporally evolving process in which future states depend on recent neural history. We evaluate our framework on the Algonauts project 2025 challenge functional magnetic resonance imaging dataset using subject-specific models. AFM significantly outperforms both a non-autoregressive flow-matching baseline and the official challenge general linear model baseline in predicting short-term parcel-wise blood oxygenation level-dependent (BOLD) activity, demonstrating improved generalization and widespread cortical prediction performance. Ablation analyses show that access to past BOLD dynamics is a dominant driver of performance, while autoregressive factorization yields consistent, modest gains under short-horizon, context-rich conditions. Together, these findings position autoregressive flow-based generative modeling as an effective approach for short-term probabilistic forecasting of neural dynamics with promising applications in closed-loop neurotechnology.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

AFM's reported wins over the GLM baseline mostly trace to giving the model past BOLD history, which standard baselines lack, rather than to the autoregressive flow matching itself.

read the letter

The main point is that autoregressive flow matching adds a modest but consistent lift for short-term parcel-wise BOLD prediction on the Algonauts data, yet the largest performance jump comes from conditioning on recent neural activity. The paper's own ablations flag this clearly, and the GLM baseline is the challenge's stimulus-only linear model, so it does not receive equivalent temporal input. That mismatch makes the headline superiority harder to attribute cleanly to the new method.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces autoregressive flow matching (AFM) as a generative framework for short-term probabilistic forecasting of parcel-wise BOLD activity. It models the conditional distribution of future neural states given recent history and concurrent multimodal sensory input, trained subject-specifically on the Algonauts 2025 fMRI dataset. AFM is reported to outperform both a non-autoregressive flow-matching baseline and the official GLM challenge baseline, with ablations attributing dominant gains to past-BOLD conditioning and only modest additional benefit from autoregressive factorization under short horizons.

Significance. If the performance gains are robustly attributable to the proposed autoregressive flow-matching construction rather than input differences, the work would provide a scalable transport-based approach to probabilistic neural dynamics modeling with potential utility for closed-loop neurotechnology. The emphasis on explicit temporal evolution and generative sampling distinguishes it from standard encoding models.

major comments (3)

[Results / Ablations] Results section (and abstract): the claim that AFM 'significantly outperforms' the non-autoregressive flow-matching baseline and GLM is difficult to interpret without explicit confirmation that both baselines receive identical past-BOLD conditioning. The ablation statement that 'access to past BOLD dynamics is a dominant driver' raises the possibility that reported gains largely reflect this additional temporal input rather than the autoregressive flow-matching mechanism itself; a direct comparison table showing input features for each model is needed.
[Methods] Methods (model specification): the autoregressive context length is listed among free parameters, yet no sensitivity analysis or justification for the chosen length is provided relative to the short-horizon prediction task. This choice directly affects the 'temporally evolving process' assumption and should be quantified.
[Evaluation / Experiments] Evaluation: no statistical tests, error bars, or cross-validation details (e.g., subject-wise splits, number of runs) are referenced for the reported outperformance on the Algonauts 2025 dataset, making it impossible to assess whether the modest autoregressive gains are reliable or dataset-specific.

minor comments (2)

[Methods] Notation for the conditional flow-matching objective and the autoregressive factorization should be introduced with explicit equations rather than prose description only.
[Figures] Figure captions for cortical prediction maps should include the exact metric (e.g., Pearson r or MSE) and the number of parcels shown.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the detailed and constructive review of our manuscript. We appreciate the referee's focus on clarifying experimental controls, model hyperparameters, and evaluation rigor. We address each major comment below and indicate the revisions planned for the next version.

read point-by-point responses

Referee: [Results / Ablations] Results section (and abstract): the claim that AFM 'significantly outperforms' the non-autoregressive flow-matching baseline and GLM is difficult to interpret without explicit confirmation that both baselines receive identical past-BOLD conditioning. The ablation statement that 'access to past BOLD dynamics is a dominant driver' raises the possibility that reported gains largely reflect this additional temporal input rather than the autoregressive flow-matching mechanism itself; a direct comparison table showing input features for each model is needed.

Authors: We thank the referee for identifying this ambiguity. In our setup, the non-autoregressive flow-matching baseline receives identical multimodal sensory inputs and past-BOLD conditioning as the AFM model; the ablation isolating past BOLD was performed by ablating it from the AFM architecture while keeping other factors fixed. The modest gains attributed to autoregressive factorization are therefore measured under matched conditioning. We will add an explicit input-features comparison table in the revised Results and Methods sections to document this for all models (AFM, non-AR FM, and GLM). revision: yes
Referee: [Methods] Methods (model specification): the autoregressive context length is listed among free parameters, yet no sensitivity analysis or justification for the chosen length is provided relative to the short-horizon prediction task. This choice directly affects the 'temporally evolving process' assumption and should be quantified.

Authors: We agree that a sensitivity analysis is needed to support the chosen context length. The value was selected via preliminary validation to balance predictive accuracy and compute for the short-horizon regime. In the revised manuscript we will include a sensitivity plot of performance versus context length together with a brief justification tied to the task horizons. revision: yes
Referee: [Evaluation / Experiments] Evaluation: no statistical tests, error bars, or cross-validation details (e.g., subject-wise splits, number of runs) are referenced for the reported outperformance on the Algonauts 2025 dataset, making it impossible to assess whether the modest autoregressive gains are reliable or dataset-specific.

Authors: We acknowledge the need for these details. Results were obtained via subject-specific models with cross-validation over the dataset runs. The revised manuscript will report error bars (standard error across subjects), paired statistical tests on performance differences, and full cross-validation specifications including subject-wise splits and run counts. revision: yes

Circularity Check

0 steps flagged

No circularity detected in modeling or claims

full rationale

The paper presents an empirical generative modeling approach (autoregressive flow matching) trained on the Algonauts 2025 fMRI dataset to learn the conditional distribution p(future BOLD | past BOLD + sensory input). Reported performance consists of out-of-sample predictions on held-out data, benchmarked against independent baselines (non-autoregressive flow matching and the official GLM), with explicit ablations quantifying the separate contributions of temporal conditioning versus autoregressive factorization. No equation, result, or central claim reduces to its own inputs by construction, self-definition, or a load-bearing self-citation chain; the derivation is a standard supervised learning pipeline whose outputs are falsifiable against external data and baselines.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard machine-learning assumptions about learnable conditional distributions in high-dimensional time series, plus the domain assumption that BOLD signals serve as a sufficient proxy for neural dynamics under naturalistic stimuli. No new entities are postulated.

free parameters (2)

neural network parameters
Weights and biases of the flow-matching model are fitted to the training portion of the fMRI dataset.
autoregressive context length
The number of past time steps used as conditioning input is a modeling choice that affects performance.

axioms (2)

domain assumption Neural activity evolves as a Markovian process conditioned on recent history and concurrent sensory input
Invoked in the description of the autoregressive factorization and conditional distribution modeling.
domain assumption Flow matching can approximate the target conditional distribution from finite fMRI samples
Underlying the generative modeling approach.

pith-pipeline@v0.9.0 · 5532 in / 1442 out tokens · 60219 ms · 2026-05-10T16:20:30.440190+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

4 extracted references · 4 canonical work pages

[1]

A., & Rao, A

https://doi.org/10.1016/S1053-8119(03)00202-7 Garg, R., Cecchi, G. A., & Rao, A. R. (2011). Full-brain auto-regressive modeling (FARM) using fMRI. NeuroImage,58(2), 416–441. https://doi.org/10.1016/j.neuroimage.2011.02.074 Gifford, A. T., Bersch, D., St-Laurent, M., Pinsard, B., Boyle, J., Bellec, L., Oliva, A., Roig, G., & Cichy, R. M. (2025). The Algona...

work page doi:10.1016/s1053-8119(03)00202-7 2011
[2]

C., Dixit, S., Keck, J., Studenyak, V., Shpilevoi, A., & Bicanski, A

https://doi.org/10.3233/978-1-61499-101-4-133 Schad, D. C., Dixit, S., Keck, J., Studenyak, V., Shpilevoi, A., & Bicanski, A. (2025). VIBE: Video-input brain encoder for fMRI response modeling. https://arxiv.org/abs/2507.17958 Schaefer, A., Kong, R., Gordon, E. M., Laumann, T. O., Zuo, X.-N., Holmes, A. J., Eickhoff, S. B., & Yeo, B. T. T. (2017). Local-g...

work page doi:10.3233/978-1-61499-101-4-133 2025
[3]

https://doi.org/10.1007/s00170-021-07682-3 Seth, A. K. (2010). A matlab toolbox for Granger causal connectivity analysis.Journal of Neuroscience Methods,186(2), 262–273. https://doi.org/10.1016/j.jneumeth.2009.11.020 Simony,E.,&Chang,C.(2020).Analysisofstimulus-inducedbraindynamicsduringnaturalisticparadigms. NeuroImage,216, 116461. https://doi.org/10.101...

work page doi:10.1007/s00170-021-07682-3 2010
[4]

https://doi.org/10.1093/cercor/bhaa260 Sonkusare, S., Breakspear, M., & Guo, C. (2019). Naturalistic stimuli in neuroscience: Critically ac- claimed.Trends in Cognitive Sciences,23(8), 699–714. https://doi.org/10.1016/j.tics.2019.05. 004 Sun, Y., Cabezas, M., Lee, J., Wang, C., Zhang, W., Calamante, F., & Lv, J. (2024). Predicting human brain states with ...

work page doi:10.1093/cercor/bhaa260 2019