arxiv: 2604.12825 · v1 · submitted 2026-04-14 · 🧬 q-bio.NC

Recognition: unknown

The illusory simplicity of the feedforward pass: evidence for the dynamical nature of stimulus encoding along the primate ventral stream

Anna Mitola, Daniel Anthes, Paolo Papale, Peter K\"onig, Sushrut Thorat, Tim C Kietzmann

Pith reviewed 2026-05-10 13:53 UTC · model grok-4.3

classification 🧬 q-bio.NC

keywords primate visionventral streamneural dynamicsfeedforward processinginformation transferrecurrent decodingvisual encodingmacaque electrophysiology

0 comments

The pith

Neural activity in the primate ventral stream encodes visual information through its changes over time rather than static spatial patterns alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how visual stimuli are processed in the macaque brain's ventral stream areas like V4 and IT. Instead of focusing only on the initial feedforward sweep and averaging responses over time, the authors use time-resolved analyses and recurrent neural network decoders on simultaneous recordings. They find that information transfer between areas varies temporally and semantically within the first 100 milliseconds, and that full temporal dynamics carry more categorical information than any single time slice's spatial activity. This suggests visual encoding is a dynamic, evolving process from the earliest stages. A reader might care because it reframes how we model and understand rapid visual perception.

Core claim

Time-resolved multivariate analyses reveal that information exchanged between V4 and IT during the first 100 ms of processing contains varied semantic content, while RNN-based decoding shows that the temporal evolution of neural patterns encodes stimulus categories beyond what is available in spatial patterns at individual moments. This indicates that stimulus encoding along the ventral stream is better described as a spatiotemporally evolving process.

What carries the argument

Time-resolved information transfer analysis between simultaneously recorded ventral stream areas combined with recurrent neural network decoding that integrates across the temporal domain.

If this is right

Even the earliest visual processing stages involve dynamic exchanges of information rather than fixed stage-like passes.
Categorical information is embedded in how neural responses change over time, not solely in their instantaneous spatial configurations.
Standard analyses that average over time or use time-local decoders may underestimate the full extent of encoded information.
Models of primate vision should account for recurrent or dynamic components starting from the initial response window.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

These dynamics might allow the brain to integrate information across slightly different time scales for more robust recognition.
Similar temporal encoding could be tested in other sensory systems or with non-invasive recordings in humans.
Computational models of vision that are purely feedforward may need revision to include early recurrent interactions.

Load-bearing premise

The assumption that the RNN decoding and time-resolved transfer analyses capture genuine neural information encoding without being distorted by the choice of model, stimulus set, or analysis decisions.

What would settle it

Demonstrating that a time-local decoder or averaged spatial pattern achieves equivalent or better accuracy in classifying stimulus categories compared to the full temporal RNN decoder on the same data.

read the original abstract

In studying primate vision, a large body of work focuses on the first feedforward sweep. During this initial time window, information is thought to pass through ventral stream regions in a stage-like fashion in an effort to extract high-level information from the retinal input. Consequently, electrophysiological analyses commonly focus on spatial response patterns, either by averaging data in time, or by applying decoders in a temporally local fashion. By analysing data recorded simultaneously across multiple arrays placed along the macaque ventral stream, we here show that this prior approach may be missing key aspects of information encoding. First, time-resolved, multivariate analyses of information transfer between V4 and IT reveal temporally and semantically varied information content as being exchanged within the first 100ms of processing. Second, by employing recurrent neural network (RNN) decoding techniques that extend across the temporal domain, we demonstrate that the neural pattern dynamics themselves carry categorical information far beyond the spatially encoded information available at any given time point. These findings challenge the prevailing view of a single, stage-like feedforward process and suggest that even the earliest parts of visual processing are better characterised as a spatiotemporally evolving process that encodes information in its dynamics rather than purely spatial response patterns.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript analyzes simultaneous multi-array electrophysiological recordings from macaque V4 and IT during visual stimulation. It reports time-resolved multivariate information transfer between areas showing varied semantic content within the first 100 ms, and uses RNN decoders spanning the temporal domain to argue that neural pattern dynamics encode categorical information beyond what is available in any instantaneous spatial pattern, challenging the standard view of a simple stage-like feedforward sweep in the ventral stream.

Significance. If the central results hold after appropriate controls, the work would provide empirical evidence that even early ventral-stream processing is better described as a spatiotemporally evolving dynamical process rather than purely spatial encoding during the initial feedforward pass. The simultaneous multi-region recordings constitute a clear strength, enabling direct examination of inter-area dynamics with standard multivariate and RNN techniques on real neural data.

major comments (2)

[RNN decoding analyses (Results)] RNN decoding analyses (Results): the claim that recurrent decoding extracts categorical information 'far beyond' the spatially encoded information available at any given time point is load-bearing for the abstract's central conclusion. No control is described that equates total temporal information access while removing recurrence, such as a feedforward network trained on time-concatenated features or a static decoder applied to the full 0-200 ms window. Without this, superior RNN accuracy remains consistent with simple temporal pooling of a feedforward spatial code.
[Methods and Results] Methods and Results: the abstract and described analyses provide no details on statistical controls, error bars, data exclusion criteria, RNN training/validation splits, or multiple-comparison corrections. These elements are required to assess whether the time-resolved transfer and decoding results support the claim of temporally and semantically varied information exchange within the first 100 ms.

minor comments (2)

[Abstract] Abstract: the time windows and stimulus categories used for the information-transfer and decoding analyses could be stated more explicitly to allow readers to evaluate the scope of the 'first 100 ms' and 'categorical' claims.
[Figure legends] Figure legends: ensure all panels report the number of trials/animals, the exact statistical tests, and any correction procedures applied.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and insightful comments on our manuscript. We address each major comment in detail below and have revised the manuscript to incorporate the suggested controls and expanded methodological information.

read point-by-point responses

Referee: RNN decoding analyses (Results): the claim that recurrent decoding extracts categorical information 'far beyond' the spatially encoded information available at any given time point is load-bearing for the abstract's central conclusion. No control is described that equates total temporal information access while removing recurrence, such as a feedforward network trained on time-concatenated features or a static decoder applied to the full 0-200 ms window. Without this, superior RNN accuracy remains consistent with simple temporal pooling of a feedforward spatial code.

Authors: We agree that demonstrating the specific contribution of recurrence requires a control that grants equivalent access to the full temporal window without recurrent connections. In the revised manuscript we will add a feedforward MLP decoder trained on time-concatenated activity vectors spanning the entire 0-200 ms epoch (with identical cross-validation and regularization as the RNN). We will report its accuracy alongside the RNN results and update the abstract and discussion to reflect this comparison. This addition directly addresses the concern that superior RNN performance could arise from temporal pooling alone. revision: yes
Referee: Methods and Results: the abstract and described analyses provide no details on statistical controls, error bars, data exclusion criteria, RNN training/validation splits, or multiple-comparison corrections. These elements are required to assess whether the time-resolved transfer and decoding results support the claim of temporally and semantically varied information exchange within the first 100 ms.

Authors: We thank the referee for highlighting these omissions. The revised Methods section will now explicitly describe: (i) permutation-based statistical controls for the multivariate information-transfer analyses; (ii) error bars as SEM across sessions; (iii) trial- and session-level exclusion criteria (e.g., artifact rejection thresholds and minimum trial counts per condition); (iv) RNN training details including 5-fold cross-validation splits, early stopping, and hyperparameter selection; and (v) FDR correction for multiple comparisons across time points and semantic categories. These details will be referenced in the Results when presenting the time-resolved transfer and decoding findings. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical analyses on neural recordings

full rationale

The paper reports time-resolved multivariate transfer analyses and RNN decoding applied to simultaneously recorded macaque ventral-stream activity. No derivation chain, first-principles result, or fitted parameter is presented; the central claims rest on direct comparisons between instantaneous spatial patterns and temporally extended decoding performance. No self-definitional steps, no renaming of known results as new derivations, and no load-bearing self-citations that reduce the argument to unverified prior work by the same authors. The work is self-contained against external benchmarks (recorded spike data and standard decoding methods) and therefore receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on interpreting multivariate transfer and RNN outputs as direct evidence of dynamical encoding, which depends on standard neuroscience assumptions about what decodable patterns represent.

axioms (2)

domain assumption Multivariate patterns in neural activity reflect encoded stimulus information
Invoked when using decoding to claim information content in V4-IT transfer and dynamics.
domain assumption Recurrent neural networks can extract temporal dependencies relevant to neural information encoding
Invoked in the claim that dynamics carry information beyond static patterns.

pith-pipeline@v0.9.0 · 5540 in / 1432 out tokens · 57478 ms · 2026-05-10T13:53:00.262920+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages · 1 internal anchor

[1]

Deep Residual Learning for Image Recognition

Hung, C. P., Kreiman, G., Poggio, T. & DiCarlo, J. J. Fast Readout of Object Identity from Macaque Inferior Temporal Cortex. Science 310, 863–866 (2005). 7. Kriegeskorte, N., Mur, M. & Bandettini, P. Representational similarity analysis - connecting the branches of systems neuroscience. Front. Syst. Neurosci. 2, (2008). 8. Kietzmann, T. C. et al. Recurren...

work page internal anchor Pith review doi:10.48550/arxiv.1512.03385 2005
[2]

On the properties of neural machine translation: Encoder-decoder approaches.arXiv preprint arXiv:1409.1259, 2014

Cho, K., Merrienboer, B. van, Bahdanau, D. & Bengio, Y. On the Properties of Neural Machine Translation: Encoder-Decoder Approaches. Preprint at https://doi.org/10.48550/arXiv.1409.1259 (2014). 16. Loshchilov, I. & Hutter, F. Decoupled Weight Decay Regularization. Preprint at https://doi.org/10.48550/arXiv.1711.05101 (2019). Supplementary Materials Fig. S...

work page doi:10.48550/arxiv.1409.1259 2014