Recognition: unknown
The illusory simplicity of the feedforward pass: evidence for the dynamical nature of stimulus encoding along the primate ventral stream
Pith reviewed 2026-05-10 13:53 UTC · model grok-4.3
The pith
Neural activity in the primate ventral stream encodes visual information through its changes over time rather than static spatial patterns alone.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Time-resolved multivariate analyses reveal that information exchanged between V4 and IT during the first 100 ms of processing contains varied semantic content, while RNN-based decoding shows that the temporal evolution of neural patterns encodes stimulus categories beyond what is available in spatial patterns at individual moments. This indicates that stimulus encoding along the ventral stream is better described as a spatiotemporally evolving process.
What carries the argument
Time-resolved information transfer analysis between simultaneously recorded ventral stream areas combined with recurrent neural network decoding that integrates across the temporal domain.
If this is right
- Even the earliest visual processing stages involve dynamic exchanges of information rather than fixed stage-like passes.
- Categorical information is embedded in how neural responses change over time, not solely in their instantaneous spatial configurations.
- Standard analyses that average over time or use time-local decoders may underestimate the full extent of encoded information.
- Models of primate vision should account for recurrent or dynamic components starting from the initial response window.
Where Pith is reading between the lines
- These dynamics might allow the brain to integrate information across slightly different time scales for more robust recognition.
- Similar temporal encoding could be tested in other sensory systems or with non-invasive recordings in humans.
- Computational models of vision that are purely feedforward may need revision to include early recurrent interactions.
Load-bearing premise
The assumption that the RNN decoding and time-resolved transfer analyses capture genuine neural information encoding without being distorted by the choice of model, stimulus set, or analysis decisions.
What would settle it
Demonstrating that a time-local decoder or averaged spatial pattern achieves equivalent or better accuracy in classifying stimulus categories compared to the full temporal RNN decoder on the same data.
read the original abstract
In studying primate vision, a large body of work focuses on the first feedforward sweep. During this initial time window, information is thought to pass through ventral stream regions in a stage-like fashion in an effort to extract high-level information from the retinal input. Consequently, electrophysiological analyses commonly focus on spatial response patterns, either by averaging data in time, or by applying decoders in a temporally local fashion. By analysing data recorded simultaneously across multiple arrays placed along the macaque ventral stream, we here show that this prior approach may be missing key aspects of information encoding. First, time-resolved, multivariate analyses of information transfer between V4 and IT reveal temporally and semantically varied information content as being exchanged within the first 100ms of processing. Second, by employing recurrent neural network (RNN) decoding techniques that extend across the temporal domain, we demonstrate that the neural pattern dynamics themselves carry categorical information far beyond the spatially encoded information available at any given time point. These findings challenge the prevailing view of a single, stage-like feedforward process and suggest that even the earliest parts of visual processing are better characterised as a spatiotemporally evolving process that encodes information in its dynamics rather than purely spatial response patterns.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript analyzes simultaneous multi-array electrophysiological recordings from macaque V4 and IT during visual stimulation. It reports time-resolved multivariate information transfer between areas showing varied semantic content within the first 100 ms, and uses RNN decoders spanning the temporal domain to argue that neural pattern dynamics encode categorical information beyond what is available in any instantaneous spatial pattern, challenging the standard view of a simple stage-like feedforward sweep in the ventral stream.
Significance. If the central results hold after appropriate controls, the work would provide empirical evidence that even early ventral-stream processing is better described as a spatiotemporally evolving dynamical process rather than purely spatial encoding during the initial feedforward pass. The simultaneous multi-region recordings constitute a clear strength, enabling direct examination of inter-area dynamics with standard multivariate and RNN techniques on real neural data.
major comments (2)
- [RNN decoding analyses (Results)] RNN decoding analyses (Results): the claim that recurrent decoding extracts categorical information 'far beyond' the spatially encoded information available at any given time point is load-bearing for the abstract's central conclusion. No control is described that equates total temporal information access while removing recurrence, such as a feedforward network trained on time-concatenated features or a static decoder applied to the full 0-200 ms window. Without this, superior RNN accuracy remains consistent with simple temporal pooling of a feedforward spatial code.
- [Methods and Results] Methods and Results: the abstract and described analyses provide no details on statistical controls, error bars, data exclusion criteria, RNN training/validation splits, or multiple-comparison corrections. These elements are required to assess whether the time-resolved transfer and decoding results support the claim of temporally and semantically varied information exchange within the first 100 ms.
minor comments (2)
- [Abstract] Abstract: the time windows and stimulus categories used for the information-transfer and decoding analyses could be stated more explicitly to allow readers to evaluate the scope of the 'first 100 ms' and 'categorical' claims.
- [Figure legends] Figure legends: ensure all panels report the number of trials/animals, the exact statistical tests, and any correction procedures applied.
Simulated Author's Rebuttal
We thank the referee for their constructive and insightful comments on our manuscript. We address each major comment in detail below and have revised the manuscript to incorporate the suggested controls and expanded methodological information.
read point-by-point responses
-
Referee: RNN decoding analyses (Results): the claim that recurrent decoding extracts categorical information 'far beyond' the spatially encoded information available at any given time point is load-bearing for the abstract's central conclusion. No control is described that equates total temporal information access while removing recurrence, such as a feedforward network trained on time-concatenated features or a static decoder applied to the full 0-200 ms window. Without this, superior RNN accuracy remains consistent with simple temporal pooling of a feedforward spatial code.
Authors: We agree that demonstrating the specific contribution of recurrence requires a control that grants equivalent access to the full temporal window without recurrent connections. In the revised manuscript we will add a feedforward MLP decoder trained on time-concatenated activity vectors spanning the entire 0-200 ms epoch (with identical cross-validation and regularization as the RNN). We will report its accuracy alongside the RNN results and update the abstract and discussion to reflect this comparison. This addition directly addresses the concern that superior RNN performance could arise from temporal pooling alone. revision: yes
-
Referee: Methods and Results: the abstract and described analyses provide no details on statistical controls, error bars, data exclusion criteria, RNN training/validation splits, or multiple-comparison corrections. These elements are required to assess whether the time-resolved transfer and decoding results support the claim of temporally and semantically varied information exchange within the first 100 ms.
Authors: We thank the referee for highlighting these omissions. The revised Methods section will now explicitly describe: (i) permutation-based statistical controls for the multivariate information-transfer analyses; (ii) error bars as SEM across sessions; (iii) trial- and session-level exclusion criteria (e.g., artifact rejection thresholds and minimum trial counts per condition); (iv) RNN training details including 5-fold cross-validation splits, early stopping, and hyperparameter selection; and (v) FDR correction for multiple comparisons across time points and semantic categories. These details will be referenced in the Results when presenting the time-resolved transfer and decoding findings. revision: yes
Circularity Check
No circularity: empirical analyses on neural recordings
full rationale
The paper reports time-resolved multivariate transfer analyses and RNN decoding applied to simultaneously recorded macaque ventral-stream activity. No derivation chain, first-principles result, or fitted parameter is presented; the central claims rest on direct comparisons between instantaneous spatial patterns and temporally extended decoding performance. No self-definitional steps, no renaming of known results as new derivations, and no load-bearing self-citations that reduce the argument to unverified prior work by the same authors. The work is self-contained against external benchmarks (recorded spike data and standard decoding methods) and therefore receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Multivariate patterns in neural activity reflect encoded stimulus information
- domain assumption Recurrent neural networks can extract temporal dependencies relevant to neural information encoding
Reference graph
Works this paper leans on
-
[1]
Deep Residual Learning for Image Recognition
Hung, C. P., Kreiman, G., Poggio, T. & DiCarlo, J. J. Fast Readout of Object Identity from Macaque Inferior Temporal Cortex. Science 310, 863–866 (2005). 7. Kriegeskorte, N., Mur, M. & Bandettini, P. Representational similarity analysis - connecting the branches of systems neuroscience. Front. Syst. Neurosci. 2, (2008). 8. Kietzmann, T. C. et al. Recurren...
work page internal anchor Pith review doi:10.48550/arxiv.1512.03385 2005
-
[2]
Cho, K., Merrienboer, B. van, Bahdanau, D. & Bengio, Y. On the Properties of Neural Machine Translation: Encoder-Decoder Approaches. Preprint at https://doi.org/10.48550/arXiv.1409.1259 (2014). 16. Loshchilov, I. & Hutter, F. Decoupled Weight Decay Regularization. Preprint at https://doi.org/10.48550/arXiv.1711.05101 (2019). Supplementary Materials Fig. S...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.