arxiv: 2604.18827 · v1 · submitted 2026-04-20 · 🧬 q-bio.NC · cs.AI

Recognition: unknown

OmniMouse: Scaling properties of multi-modal, multi-task Brain Models on 150B Neural Tokens

Alexander S. Ecker, Alex Gilbert, Andreas S. Tolias, Emin Orhan, Erick Cobos, Fabian H. Sinz, Goirik Chakrabarty, Hasan A. Bedel, Katrin Franke, Kayla Ponder, Konstantin F. Willeke, Lydia Ntanavara, Marissa A. Weis, Michaela Vystr\v{c}ilov\'a, Paul G. Fahey, Polina Turishcheva, Rachel E. Froebe, Sophia Sanborn, Taliah Muhammad, Yongrong Qiu, Zheng Huan Tan

Authors on Pith no claims yet

Pith reviewed 2026-05-10 02:40 UTC · model grok-4.3

classification 🧬 q-bio.NC cs.AI

keywords scaling lawsmouse visual cortexneural activity predictionmulti-task modelsbehavioral decodingneural forecastingdata-limited regime

0 comments

The pith

Mouse visual cortex models scale reliably with more neural data but show saturating gains from larger model sizes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper trains a single multi-modal, multi-task model on 150 billion neural tokens recorded from the visual cortex of 73 mice. It shows that adding more training data steadily improves performance on neural prediction, behavioral decoding, and neural forecasting, while simply increasing model parameters brings little additional benefit once a certain size is reached. This pattern reverses the scaling law seen in language and vision models, where parameter growth on massive data sets drives most progress. A sympathetic reader would conclude that brain activity modeling remains fundamentally limited by the volume and richness of available recordings, even when those recordings are already enormous. The work raises the possibility that further data expansion could trigger qualitatively new modeling capabilities.

Core claim

Performance on neural, behavioral, and forecasting tasks improves consistently as the amount of training data grows, yet plateaus when the number of model parameters is increased, showing that current brain models operate in a data-limited regime despite the scale of 150 billion neural tokens.

What carries the argument

A single multi-modal, multi-task model that at test time flexibly performs neural prediction, behavioral decoding, neural forecasting, or any combination of the three.

If this is right

Further increases in neural data volume should continue to raise performance across all three evaluation regimes.
A single model can replace multiple specialized networks without loss of accuracy.
Systematic data scaling opens the door to phase transitions in which qualitatively new capabilities appear once datasets become sufficiently large and diverse.
Specialized single-task baselines are outperformed once the multi-task model is trained on the full data set.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The data-limited finding may explain why brain models have not yet exhibited the abrupt capability jumps observed in large language models.
Similar scaling experiments on recordings from other brain areas or species could test whether the data-limited regime is general.
Efforts to expand neural data collection may now yield higher returns than further architectural tuning.

Load-bearing premise

The reported scaling relationships are not produced by the specific model architectures, training procedures, or data splits chosen for the study.

What would settle it

A controlled experiment in which model size is increased while holding the data volume fixed and the new larger models still fail to improve performance would support the saturation claim; the opposite result would falsify it.

Figures

Figures reproduced from arXiv: 2604.18827 by Alexander S. Ecker, Alex Gilbert, Andreas S. Tolias, Emin Orhan, Erick Cobos, Fabian H. Sinz, Goirik Chakrabarty, Hasan A. Bedel, Katrin Franke, Kayla Ponder, Konstantin F. Willeke, Lydia Ntanavara, Marissa A. Weis, Michaela Vystr\v{c}ilov\'a, Paul G. Fahey, Polina Turishcheva, Rachel E. Froebe, Sophia Sanborn, Taliah Muhammad, Yongrong Qiu, Zheng Huan Tan.

**Figure 1.** Figure 1: A. OmniMouse unifies neural prediction, behavior decoding, and forecasting tasks. B. Scaling model size on 150+ billion neural tokens shows performance saturation, unlike language models. C. In contrast, scaling data consistently improves performance across all model sizes. 1 arXiv:2604.18827v1 [q-bio.NC] 20 Apr 2026 [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: Data. A. Data were collected via calcium imaging from head-fixed mice running on a wheel while viewing visual stimuli. Behavior variables include pupil center x and y positions, pupil dilation and its derivative and running speed. B. Dataset statistics. C. Different visual stimuli were presented across sessions, with stimulus types varying by session. Behavior variables. Our dataset contains five behavior … view at source ↗

**Figure 3.** Figure 3: Model architecture. OmniMouse introduces a unified framework that handles arbitrary combinations of neural forecasting, sub-population prediction, stimulus encoding, and behavioral decoding through flexible masking. We adopt single-neuron, single-time-chunk tokenization and a cross-attention encoder (following POYO+ (Azabou et al., 2025)), along with analogous queries to the multi-modal cross-attention dec… view at source ↗

**Figure 4.** Figure 4: Masking of neuronal responses We train OmniMouse with 119 structured masking configurations (Tab. S6), including our core evaluation tasks (forecasting, population prediction, stimulus encoding, and behavioral decoding) as well as numerous systematic variations that reduce or combine context across modalities. Neural responses. We define a consistent prediction target used by all masking configurations—t… view at source ↗

**Figure 5.** Figure 5: Task-specific performance gains with model scaling. Top row: masking schema. Middle row: Test loss. Bottom row: single-trial correlation. A. Forecasting; predicting one second neuronal activity, conditioned only on past neuronal activity B. Population prediction; conditioned on a subpopulation of N = 1024 neurons. C. Stimulus-encoding: Neuronal encoding conditioned on the visual stimulus. D. Stimulus-cond… view at source ↗

**Figure 6.** Figure 6: Behavior decoding scales with model size. A. Masking for behavior decoding. B. Decoding loss averaged across all behaviors variables. C-E. Prediction performance for behavioral decoding. the full 2-second context window, predict a target population. This task assesses how much of the trial-to-trial variability can be explained by simultaneously recorded neurons. As in the forecasting task, we chose MtM (Zh… view at source ↗

**Figure 7.** Figure 7: Scaling data improves model performance. A. Nested datasets structure. B. Test loss for different model and data sizes, averaged across all response prediction tasks. C-K. Performance improvements for all tasks when scaling dataset from 8 to 323 sessions. 9 [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

read the original abstract

Scaling data and artificial neural networks has transformed AI, driving breakthroughs in language and vision. Whether similar principles apply to modeling brain activity remains unclear. Here we leveraged a dataset of 3.1 million neurons from the visual cortex of 73 mice across 323 sessions, totaling more than 150 billion neural tokens recorded during natural movies, images and parametric stimuli, and behavior. We train multi-modal, multi-task models that support three regimes flexibly at test time: neural prediction, behavioral decoding, neural forecasting, or any combination of the three. OmniMouse achieves state-of-the-art performance, outperforming specialized baselines across nearly all evaluation regimes. We find that performance scales reliably with more data, but gains from increasing model size saturate. This inverts the standard AI scaling story: in language and computer vision, massive datasets make parameter scaling the primary driver of progress, whereas in brain modeling -- even in the mouse visual cortex, a relatively simple system -- models remain data-limited despite vast recordings. The observation of systematic scaling raises the possibility of phase transitions in neural modeling, where larger and richer datasets might unlock qualitatively new capabilities, paralleling the emergent properties seen in large language models. Code available at https://github.com/enigma-brain/omnimouse.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces OmniMouse, a multi-modal multi-task model trained on 150B neural tokens from 3.1M neurons across 73 mice and 323 sessions in visual cortex during natural movies, images, parametric stimuli, and behavior. It reports SOTA performance across neural prediction, behavioral decoding, and forecasting regimes, with performance scaling reliably with data volume but saturating with increases in model size, inverting standard AI scaling laws and implying that brain models remain data-limited.

Significance. If the scaling relationships are robust, the work would be significant for brain modeling by challenging the parameter-centric scaling paradigm from language and vision, emphasizing data collection needs even for a simple system like mouse V1, and suggesting possible phase transitions or emergent capabilities with larger datasets. The public code release supports reproducibility and is a clear strength.

major comments (2)

[§4.2 and Abstract] §4.2 (Scaling with model size) and Abstract: The central inversion claim—that gains saturate with model size while continuing to rise with data—requires explicit confirmation that training compute (steps, epochs, or total FLOPs) was scaled with parameter count per standard practice. Without this, saturation could arise from fixed training budgets or unadjusted hyperparameters for larger models, undermining the data-limited regime interpretation.
[§3 and scaling figures] §3 (Methods) and scaling figures: The manuscript lacks reported error bars, multiple random seeds, or ablation details on the multi-task objective and data splits for the scaling curves. This makes it difficult to rule out that the reported saturation is an artifact of architecture-specific optimization difficulties or implicit leakage in the multi-regime evaluation.

minor comments (2)

[Abstract and §2.1] The abstract states '150 billion neural tokens' but the exact tokenization and session breakdown should be cross-referenced in §2.1 for precision.
[Figure captions] Figure captions for scaling plots should explicitly state the number of runs and whether hyperparameters were re-tuned for each model size.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and insightful comments. These have prompted us to strengthen the clarity and rigor of our scaling analyses. We address each major comment point-by-point below and have revised the manuscript accordingly.

read point-by-point responses

Referee: [§4.2 and Abstract] §4.2 (Scaling with model size) and Abstract: The central inversion claim—that gains saturate with model size while continuing to rise with data—requires explicit confirmation that training compute (steps, epochs, or total FLOPs) was scaled with parameter count per standard practice. Without this, saturation could arise from fixed training budgets or unadjusted hyperparameters for larger models, undermining the data-limited regime interpretation.

Authors: We agree that explicit confirmation of compute scaling is essential to support the data-limited interpretation. In our original experiments, we scaled training steps proportionally with model size (larger models received 1.5–2× more steps to reach comparable loss plateaus), following standard scaling-law protocols; total FLOPs were tracked via the Chinchilla-style estimator. We have now added a dedicated paragraph in revised §4.2, a new table (Table S3) listing steps/epochs/FLOPs per model size, and updated the Abstract to reference this protocol. These changes remove ambiguity and reinforce that saturation is not an artifact of under-training. revision: yes
Referee: [§3 and scaling figures] §3 (Methods) and scaling figures: The manuscript lacks reported error bars, multiple random seeds, or ablation details on the multi-task objective and data splits for the scaling curves. This makes it difficult to rule out that the reported saturation is an artifact of architecture-specific optimization difficulties or implicit leakage in the multi-regime evaluation.

Authors: We acknowledge the value of statistical robustness reporting. The revised manuscript now includes error bars (mean ± SEM across 3 independent random seeds) on all scaling curves in Figures 4 and 5. We have added a new subsection in §3.4 detailing the multi-task loss weighting ablations (varying the neural-prediction vs. decoding coefficients) and confirming that performance saturation persists across weightings. Data-split procedures are expanded to explicitly state that training and test sets use disjoint sessions and neurons with no cross-regime leakage. These additions are placed in the main text and Supplementary Note 2. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical scaling results from held-out evaluation

full rationale

The paper reports experimental outcomes from training multi-modal multi-task models on 150B neural tokens and measuring performance across regimes on held-out sessions. Scaling observations (data improves performance; model size saturates) are direct measurements, not derived predictions that reduce to fitted parameters or self-citations by construction. No equations, uniqueness theorems, or ansatzes are invoked that would create self-definitional loops. The work is self-contained against external benchmarks via standard train/test splits and baseline comparisons.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

Abstract-only review limits visibility into exact hyperparameters; no invented entities or ad-hoc axioms are stated.

free parameters (2)

model sizes tested
Various model capacities were scaled; specific values and selection criteria not detailed in abstract.
data volume thresholds
Scaling curves depend on how data subsets were chosen for training.

axioms (1)

domain assumption Standard deep learning optimization and evaluation protocols apply to neural time-series data.
Implicit in training multi-modal models on neural tokens.

pith-pipeline@v0.9.0 · 5631 in / 1286 out tokens · 40640 ms · 2026-05-10T02:40:05.933993+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A multi-scale information geometry reveals the structure of mutual information in neural populations
q-bio.NC 2026-05 unverdicted novelty 7.0

A multi-scale extension of the Fisher information metric, derived from coarse-graining contraction rules, exactly captures the structure of mutual information in neural population codes and can be estimated via diffus...

Reference graph

Works this paper leans on

115 extracted references · 34 canonical work pages · cited by 1 Pith paper · 11 internal anchors

[1]

2025 , eprint=

Neural Encoding and Decoding at Scale , author=. 2025 , eprint=

2025
[2]

2024 , eprint=

Neuroformer: Multimodal and Multitask Generative Pretraining for Brain Data , author=. 2024 , eprint=

2024
[3]

Energy Guided Diffusion for Generating Neurally Exciting Images

Pierzchlewicz, Paweł A and Willeke, Konstantin F and Nix, Arne F and Elumalai, Pavithra and Restivo, Kelli and Shinn, Tori and Nealley, Cate and Rodriguez, Gabrielle and Patel, Saumil and Franke, Katrin and Tolias, Andreas S and Sinz, Fabian H. Energy Guided Diffusion for Generating Neurally Exciting Images. Advances in Neural Processing Systems (NeurIPS 2023)

2023
[4]

bioRxiv , pages=

Data Heterogeneity Limits the Scaling Effect of Pretraining Neural Data Transformers , author=. bioRxiv , pages=. 2025 , publisher=

2025
[5]

bioRxiv , pages=

A Generalist Intracortical Motor Decoder , author=. bioRxiv , pages=. 2025 , publisher=

2025
[6]

The Thirteenth International Conference on Learning Representations , year =

Multi-session, multi-task neural decoding from distinct cell-types and brain regions , author=. The Thirteenth International Conference on Learning Representations , year =
[7]

arXiv preprint arXiv:2108.01210 , year=

Representation learning for neural population activity with neural data transformers , author=. arXiv preprint arXiv:2108.01210 , year=

work page arXiv
[8]

arXiv preprint arXiv:2206.08666 , year=

The Sensorium competition on predicting large-scale mouse primary visual cortex activity , author=. arXiv preprint arXiv:2206.08666 , year=

work page arXiv
[9]

Advances in Neural Information Processing Systems , volume=

Retrospective for the Dynamic Sensorium Competition for predicting large-scale mouse primary visual cortex activity from videos , author=. Advances in Neural Information Processing Systems , volume=
[10]

Nature , volume=

Functional connectomics reveals general wiring rule in mouse visual cortex , author=. Nature , volume=. 2025 , publisher=

2025
[11]

Advances in Neural Information Processing Systems , volume=

A flow-based latent state generative model of neural population responses to natural images , author=. Advances in Neural Information Processing Systems , volume=
[12]

2025 , eprint=

Modeling Dynamic Neural Activity by combining Naturalistic Video Stimuli and Stimulus-independent Latent Factors , author=. 2025 , eprint=

2025
[13]

Advances in Neural Information Processing Systems , volume=

Learning time-invariant representations for individual neurons from population dynamics , author=. Advances in Neural Information Processing Systems , volume=
[14]

Advances in Neural Information Processing Systems , volume=

A unified, scalable framework for neural population decoding , author=. Advances in Neural Information Processing Systems , volume=
[15]

Nature , volume=

Learnable latent embeddings for joint behavioural and neural analysis , author=. Nature , volume=. 2023 , publisher=

2023
[16]

ACM Transactions on Intelligent Systems and Technology , volume=

A comprehensive overview of large language models , author=. ACM Transactions on Intelligent Systems and Technology , volume=. 2025 , publisher=

2025
[17]

Large Language Models: A Survey

Large language models: A survey , author=. arXiv preprint arXiv:2402.06196 , year=

work page internal anchor Pith review arXiv
[18]

Nature , volume=

Foundation model of neural activity predicts response to new stimulus types , author=. Nature , volume=. 2025 , publisher=

2025
[19]

universal translator

Towards a" universal translator" for neural dynamics at single-cell, single-spike resolution , author=. Advances in Neural Information Processing Systems , volume=
[20]

Transformer language models without positional encodings still learn positional information

Transformer language models without positional encodings still learn positional information , author=. arXiv preprint arXiv:2203.16634 , year=

work page arXiv
[21]

Neurocomputing , volume=

Roformer: Enhanced transformer with rotary position embedding , author=. Neurocomputing , volume=. 2024 , publisher=

2024
[22]

Training Compute-Optimal Large Language Models

Training compute-optimal large language models , author=. arXiv preprint arXiv:2203.15556 , year=

work page internal anchor Pith review arXiv
[23]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Mirasol3b: A multimodal autoregressive model for time-aligned and contextual modalities , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
[24]

Science , volume=

Spontaneous behaviors drive multidimensional, brainwide activity , author=. Science , volume=. 2019 , publisher=

2019
[25]

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Chameleon: Mixed-modal early-fusion foundation models, 2024 , author=. URL https://arxiv. org/abs/2405.09818 , volume=

work page internal anchor Pith review arXiv 2024
[26]

Spatial vision , volume=

The psychophysics toolbox , author=. Spatial vision , volume=. 1997 , publisher=

1997
[27]

2007 , publisher=

What's new in Psychtoolbox-3? , author=. 2007 , publisher=

2007
[28]

, author=

The VideoToolbox software for visual psychophysics: transforming numbers into movies. , author=. Spatial vision , volume=
[29]

BioRxiv , pages=

Digital twin reveals combinatorial code of non-linear computations in the mouse primary visual cortex , author=. BioRxiv , pages=. 2022 , publisher=

2022
[30]

Nature , volume=

Accurate structure prediction of biomolecular interactions with AlphaFold 3 , author=. Nature , volume=. 2024 , publisher=

2024
[31]

Scaling Laws for Neural Language Models

Scaling laws for neural language models , author=. arXiv preprint arXiv:2001.08361 , year=

work page internal anchor Pith review Pith/arXiv arXiv 2001
[32]

Advances in Neural Information Processing Systems , volume=

Neural data transformer 2: multi-context pretraining for neural spiking activity , author=. Advances in Neural Information Processing Systems , volume=
[33]

arXiv preprint arXiv:2302.03023 , year=

V1t: large-scale mouse v1 response prediction using a vision transformer , author=. arXiv preprint arXiv:2302.03023 , year=

work page arXiv
[34]

Advances in Neural Information Processing Systems , volume=

Stndt: Modeling neural population activity with spatiotemporal transformers , author=. Advances in Neural Information Processing Systems , volume=
[35]

2024 , eprint=

QuantFormer: Learning to Quantize for Neural Activity Forecasting in Mouse Visual Cortex , author=. 2024 , eprint=

2024
[36]

what” and “where

Neural system identification for large populations separating “what” and “where” , author=. Advances in neural information processing systems , volume=
[37]

Advances in neural information processing systems , volume=

Stimulus domain transfer in recurrent models for large scale cortical population prediction on video , author=. Advances in neural information processing systems , volume=
[38]

PLoS computational biology , volume=

Model constrained by visual hierarchy improves prediction of neural responses to natural scenes , author=. PLoS computational biology , volume=. 2016 , publisher=

2016
[39]

elife , volume=

CaImAn an open source tool for scalable calcium imaging data analysis , author=. elife , volume=. 2019 , publisher=

2019
[40]

International conference on machine learning , pages=

Hiera: A hierarchical vision transformer without the bells-and-whistles , author=. International conference on machine learning , pages=. 2023 , organization=

2023
[41]

International Conference on Machine Learning , pages=

Scaling laws for generative mixed-modal language models , author=. International Conference on Machine Learning , pages=. 2023 , organization=

2023
[42]

Scaling laws for native multimodal models.arXiv preprint arXiv:2504.07951, 2025

Scaling laws for native multimodal models , author=. arXiv preprint arXiv:2504.07951 , year=

work page arXiv
[43]

and Ponder, Kayla and Ding, Zhuokun and Froebe, Rachel and Ntanavara, Lydia and Fahey, Paul G

Ding, Zhiwei and Tran, Dat T. and Ponder, Kayla and Ding, Zhuokun and Froebe, Rachel and Ntanavara, Lydia and Fahey, Paul G. and Cobos, Erick and Baroni, Luca and Diamantaki, Maria and Wang, Eric Y. and Chang, Andersen and Papadopoulos, Stelios and Fu, Jiakun and Muhammad, Taliah and Papadopoulos, Christos and Cadena, Santiago A. and Evangelou, Alexandros...

2025
[44]

Nature , volume=

A brain-wide map of neural activity during complex behaviour , author=. Nature , volume=. 2025 , publisher=

2025
[45]

arXiv preprint arXiv:2107.14795 , year=

Perceiver io: A general architecture for structured inputs & outputs , author=. arXiv preprint arXiv:2107.14795 , year=

work page arXiv
[46]

International Conference on Learning Representations , year=

Generalization in data-driven models of primary visual cortex , author=. International Conference on Learning Representations , year=
[47]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

An image is worth 16x16 words: Transformers for image recognition at scale , author=. arXiv preprint arXiv:2010.11929 , year=

work page internal anchor Pith review Pith/arXiv arXiv 2010
[48]

Large-scale training of foundation models for wearable biosignals.arXiv preprint arXiv:2312.05409, 2023

Large-scale training of foundation models for wearable biosignals , author=. arXiv preprint arXiv:2312.05409 , year=

work page arXiv
[49]

bioRxiv , pages=

BrainLM: A foundation model for brain activity recordings , author=. bioRxiv , pages=. 2023 , publisher=

2023
[50]

ArXiv , pages=

Population Transformer: Learning population-level representations of neural activity , author=. ArXiv , pages=
[51]

arXiv preprint arXiv:2401.10278 , year=

EEGFormer: Towards transferable and interpretable large-scale EEG foundation model , author=. arXiv preprint arXiv:2401.10278 , year=

work page arXiv
[52]

arXiv preprint arXiv:2404.09256 , year=

Foundational gpt model for meg , author=. arXiv preprint arXiv:2404.09256 , year=

work page arXiv
[53]

2024 IEEE International Symposium on Biomedical Imaging (ISBI) , pages=

Neuro-gpt: Towards a foundation model for eeg , author=. 2024 IEEE International Symposium on Biomedical Imaging (ISBI) , pages=. 2024 , organization=

2024
[54]

& Lu, B.-L

Large brain model for learning generic representations with tremendous EEG data in BCI , author=. arXiv preprint arXiv:2405.18765 , year=

work page arXiv
[55]

Frontiers in Human Neuroscience , volume=

BENDR: Using transformers and a contrastive self-supervised learning task to learn from massive amounts of EEG data , author=. Frontiers in Human Neuroscience , volume=. 2021 , publisher=

2021
[56]

Advances in Neural Information Processing Systems , volume=

Neurobolt: Resting-state eeg-to-fmri synthesis with multi-dimensional feature mapping , author=. Advances in Neural Information Processing Systems , volume=
[57]

Advances in Neural Information Processing Systems , volume=

Brain-JEPA: Brain dynamics foundation model with gradient positioning and spatiotemporal masking , author=. Advances in Neural Information Processing Systems , volume=
[58]

Advances in Neural Information Processing Systems , volume=

Brain network transformer , author=. Advances in Neural Information Processing Systems , volume=
[59]

Advances in Neural Information Processing Systems , volume=

Brant: Foundation model for intracranial neural signal , author=. Advances in Neural Information Processing Systems , volume=
[60]

NeuroImage , volume=

BrainNetCNN: Convolutional neural networks for brain networks; towards predicting neurodevelopment , author=. NeuroImage , volume=. 2017 , publisher=

2017
[61]

arXiv preprint arXiv:2405.14425 , year=

When predict can also explain: few-shot prediction to select better neural latents , author=. arXiv preprint arXiv:2405.14425 , year=

work page arXiv
[62]

One Model to Train Them All: A Unified Diffusion Framework for Multi-Context Neural Population Forecasting , author=
[63]

Advances in Neural Information Processing Systems , volume=

Biot: Biosignal transformer for cross-data learning in the wild , author=. Advances in Neural Information Processing Systems , volume=
[64]

Brainbert: Self- supervised representation learning for intracranial recordings.arXiv preprint arXiv:2302.14367,

BrainBERT: Self-supervised representation learning for intracranial recordings , author=. arXiv preprint arXiv:2302.14367 , year=

work page arXiv
[65]

Advances in neural information processing systems , volume=

Self-supervised learning of brain dynamics from broad neuroimaging data , author=. Advances in neural information processing systems , volume=
[66]

Thapa, B

Sleepfm: Multi-modal representation learning for sleep across brain activity, ecg and respiratory signals , author=. arXiv preprint arXiv:2405.17766 , year=

work page arXiv
[67]

arXiv preprint arXiv:2410.14031 , year=

Modeling the Human Visual System: Comparative Insights from Response-Optimized and Task-Optimized Vision Models, Language Models, and different Readout Mechanisms , author=. arXiv preprint arXiv:2410.14031 , year=

work page arXiv
[68]

arXiv e-prints , pages=

Incremental Learning and Self-Attention Mechanisms Improve Neural System Identification , author=. arXiv e-prints , pages=
[69]

bioRxiv , pages=

Movie reconstruction from mouse visual cortex activity , author=. bioRxiv , pages=. 2024 , publisher=

2024
[70]

Advances in neural information processing systems , volume=

From deep learning to mechanistic understanding in neuroscience: the structure of retinal prediction , author=. Advances in neural information processing systems , volume=
[71]

Nvlm: Open frontier-class multimodal llms.arXiv preprint arXiv:2409.11402, 2024

Nvlm: Open frontier-class multimodal llms , author=. arXiv preprint arXiv:2409.11402 , year=

work page arXiv
[72]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Multi-Modal Latent Variables for Cross-Individual Primary Visual Cortex Modeling and Analysis , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[73]

Neuron , volume=

Modulation of visual responses by behavioral state in mouse visual cortex , author=. Neuron , volume=. 2010 , publisher=

2010
[74]

Neuron , volume=

Pupil fluctuations track fast switching of cortical states during quiet wakefulness , author=. Neuron , volume=. 2014 , publisher=

2014
[75]

elife , volume=

A large field of view two-photon mesoscope with subcellular resolution for in vivo imaging , author=. elife , volume=. 2016 , publisher=

2016
[76]

Large-Scale Video Classification with Convolutional Neural Networks

Karpathy, Andrej and Toderici, George and Shetty, Sanketh and Leung, Thomas and Sukthankar, Rahul and Fei-Fei, Li. Large-Scale Video Classification with Convolutional Neural Networks. 2014 IEEE Conference on Computer Vision and Pattern Recognition

2014
[77]

T., Ahrens, M

POCO: Scalable Neural Forecasting through Population Conditioning , author=. arXiv preprint arXiv:2506.14957 , year=

work page arXiv
[78]

BioRXiv , pages=

A global map of orientation tuning in mouse visual cortex , author=. BioRXiv , pages=. 2019 , publisher=

2019
[79]

Nature Communications , volume=

A simplified minimodel of visual cortical neurons , author=. Nature Communications , volume=. 2025 , publisher=

2025
[80]

2025 , doi =

Li, Bryan M and De Wulf, Wolf and Katsanevaki, Danai and Onken, Arno and Rochefort, Nathalie LI , title =. 2025 , doi =. https://www.biorxiv.org/content/early/2025/09/17/2025.09.16.676524.full.pdf , journal =

2025

Showing first 80 references.