Can neurons speak? Semantic narration of vision at single-cell resolution
Pith reviewed 2026-06-26 18:50 UTC · model grok-4.3
The pith
NEURRATOR converts spiking activity from single neurons into natural-language descriptions of viewed scenes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A learned encoder maps spike trains from any chosen subset of recorded neurons into the patch-embedding space of a frozen CLIP model; a multimodal language model then produces a description of the visual stimulus and a sparse autoencoder validates it, all without training on the language side. Applied to Neuropixel data from mouse visual cortex during natural movie viewing, the same pipeline yields coherent narrations from thousands of neurons, from one region, from small local populations, or from molecularly defined inhibitory cell types.
What carries the argument
The learned encoder that projects arbitrary spike-train subsets into the frozen CLIP patch-embedding space.
If this is right
- Decoding accuracy increases measurably with the number of neurons included and varies across cortical regions.
- Individual neurons and genetically tagged cell types can each be described in plain language for their specific contribution to the scene.
- Cell identity can be treated as a functional probe that reveals what part of the visual world a given neuron represents.
- The same pipeline works on populations of any size or composition without retraining the language model.
Where Pith is reading between the lines
- The method could be used to compare semantic content across different visual areas or across different behavioral states in the same animal.
- If the narrations remain stable when subsets of neurons are removed, that would indicate which cells carry redundant versus unique semantic information.
- The approach might be tested on data from other sensory modalities to see whether the same encoder-to-language route applies outside vision.
Load-bearing premise
The encoder's embeddings in CLIP space keep the semantic content of the original visual stimulus without systematic distortion that would break the downstream language generation.
What would settle it
Present the same neurons with a set of short, controlled clips that differ only in one semantic feature (such as the presence or absence of a moving object) and test whether the generated narrations reliably mention that feature when the neurons are active and omit it when they are not.
Figures
read the original abstract
Identifying what individual neurons encode in higher-order visual cortex is an open problem. Responses resist intuitive parameterization, and the deep-network embeddings used in their place are black boxes. Here, we introduce NEURRATOR, a framework that decodes spiking activity into free-form natural-language narration of the viewed scene at single-neuron resolution. A learned encoder maps spike trains from arbitrary subsets of simultaneously-recorded neurons into the patch-embedding space of a frozen CLIP, from which a multimodal language model and sparse autoencoder generates and validates a description with no language-side training. Applied to Neuropixel recordings of mouse visual cortex during natural-movie viewing, NEURRATOR narrates from thousands of neurons, singular cortical regions, local populations, or from a molecularly-defined cell-types. We use this property to (i) quantify how decoding fidelity scales with population size and cortical region, and (ii) "neurrate", in plain language, what individual neurons and genetically-tagged inhibitory cell-types contribute to visual representation. This recasts cell identity from a classification target into a functional probe of the visual system, providing a new unit of biological insights in neural systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces NEURRATOR, a framework that decodes spiking activity from arbitrary subsets of neurons recorded in mouse visual cortex during natural movie viewing into free-form natural-language narrations of the viewed scene at single-neuron resolution. A learned encoder maps spike trains into the patch-embedding space of a frozen CLIP model; a multimodal language model together with a sparse autoencoder then generates and validates descriptions with no language-side training. The method is applied to Neuropixels data to quantify how decoding fidelity scales with population size and cortical region and to narrate the functional contributions of individual neurons and molecularly-defined inhibitory cell types.
Significance. If the central mapping from spikes to semantically faithful CLIP embeddings can be shown to hold with appropriate controls, the work would recast single-cell identity as a functional probe and supply a new unit of insight into visual representation. The explicit use of a frozen external model and absence of language-side training on the neural data constitute a clear strength against circularity.
major comments (2)
- [Abstract] Abstract: the central claim that NEURRATOR produces accurate single-cell narrations is unsupported by any reported quantitative validation metrics, error bars, ablation studies, held-out test performance, or description of encoder training procedure and narration-fidelity scoring; without these the data-to-claim link cannot be evaluated.
- [Abstract] Abstract: the statement that the encoder 'preserves the semantic content of the visual stimulus without systematic distortion' is presented as a premise rather than a result; no alignment metric, reconstruction fidelity, or control experiment is described that would test this load-bearing assumption.
Simulated Author's Rebuttal
We thank the referee for their thoughtful review and for identifying areas where the abstract could more explicitly link claims to supporting evidence. We address each major comment below, clarifying where the manuscript already provides the requested details and indicating revisions we are prepared to make.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that NEURRATOR produces accurate single-cell narrations is unsupported by any reported quantitative validation metrics, error bars, ablation studies, held-out test performance, or description of encoder training procedure and narration-fidelity scoring; without these the data-to-claim link cannot be evaluated.
Authors: The abstract is a concise summary; the full manuscript reports quantitative validation in the Results and Methods sections. Decoding fidelity is quantified as a function of population size and cortical region, with error bars derived from multiple held-out test splits and bootstrap resampling. Ablation studies compare the learned encoder against linear baselines and untrained mappings. The encoder training procedure (including loss, optimizer, and regularization) is detailed in Methods, and narration fidelity is scored via sparse autoencoder reconstruction error plus consistency with the multimodal LM on held-out stimuli. We will revise the abstract to include one or two key quantitative anchors (e.g., fidelity scaling) while remaining within length limits. revision: partial
-
Referee: [Abstract] Abstract: the statement that the encoder 'preserves the semantic content of the visual stimulus without systematic distortion' is presented as a premise rather than a result; no alignment metric, reconstruction fidelity, or control experiment is described that would test this load-bearing assumption.
Authors: The manuscript presents preservation of semantic content as an empirical outcome of the encoder's training objective (mapping spikes to frozen CLIP patch embeddings). Alignment is measured by cosine similarity and retrieval accuracy between encoded spike embeddings and ground-truth CLIP embeddings on held-out movies. Reconstruction fidelity is quantified by the sparse autoencoder's ability to recover the original CLIP patches, and control experiments include shuffled spike trains and random encoders to rule out systematic distortion. We will rephrase the abstract to frame this explicitly as a validated result rather than an assumption and will add a brief parenthetical reference to the alignment metric. revision: yes
Circularity Check
No significant circularity identified
full rationale
The paper's core pipeline maps spike trains via a learned encoder into the embedding space of a frozen external CLIP model, then uses an off-the-shelf multimodal language model and sparse autoencoder for narration generation, with explicit statement of no language-side training. No equations, training procedures, or self-citations are described that would reduce the reported narrations or fidelity metrics to quantities fitted from the same neural data used for evaluation. The derivation chain therefore remains self-contained against external benchmarks (CLIP, language models) and does not exhibit self-definitional, fitted-input, or self-citation load-bearing reductions.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Neurons in the retina: organization, inhibition and excitation problems
Stephen W Kuffler. Neurons in the retina: organization, inhibition and excitation problems. InCold Spring Harbor Symposia on Quantitative Biology, volume 17, pages 281–292. Cold Spring Harbor Laboratory Press, 1952
1952
-
[2]
Reconstruction of natural scenes from ensemble responses in the lateral geniculate nucleus.Journal of Neuroscience, 19(18):8036–8042, 1999
Garrett B Stanley, Fei F Li, and Yang Dan. Reconstruction of natural scenes from ensemble responses in the lateral geniculate nucleus.Journal of Neuroscience, 19(18):8036–8042, 1999
1999
-
[3]
The connections of the middle temporal visual area (mt) and their relationship to a cortical hierarchy in the macaque monkey.Journal of Neuroscience, 3(12):2563–2586, 1983
John H Maunsell and David C van Essen. The connections of the middle temporal visual area (mt) and their relationship to a cortical hierarchy in the macaque monkey.Journal of Neuroscience, 3(12):2563–2586, 1983
1983
-
[4]
Distributed hierarchical processing in the primate cerebral cortex.Cerebral cortex (New York, NY: 1991), 1(1):1–47, 1991
Daniel J Felleman and David C Van Essen. Distributed hierarchical processing in the primate cerebral cortex.Cerebral cortex (New York, NY: 1991), 1(1):1–47, 1991
1991
-
[5]
Inferotemporal cortex and vision.Progress in physiological psychology, 5:77–123, 1973
Charles G Gross. Inferotemporal cortex and vision.Progress in physiological psychology, 5:77–123, 1973
1973
-
[6]
Neuronal selectivities to complex object features in the ventral visual pathway of the macaque cerebral cortex.Journal of neurophysiology, 71(3):856–867, 1994
Eucaly Kobatake and Keiji Tanaka. Neuronal selectivities to complex object features in the ventral visual pathway of the macaque cerebral cortex.Journal of neurophysiology, 71(3):856–867, 1994
1994
-
[7]
Comparing face patch systems in macaques and humans.Proceedings of the National Academy of Sciences, 105(49):19514–19519, 2008
Doris Y Tsao, Sebastian Moeller, and Winrich A Freiwald. Comparing face patch systems in macaques and humans.Proceedings of the National Academy of Sciences, 105(49):19514–19519, 2008
2008
-
[8]
face cells
Kasper Vinken, Jacob S Prince, Talia Konkle, and Margaret S Livingstone. The neural code for “face cells” is not face-specific.Science advances, 9(35):eadg1736, 2023
2023
-
[9]
Performance-optimized hierarchical models predict neural responses in higher visual cortex.Proceedings of the national academy of sciences, 111(23):8619–8624, 2014
Daniel LK Yamins, Ha Hong, Charles F Cadieu, Ethan A Solomon, Darren Seibert, and James J DiCarlo. Performance-optimized hierarchical models predict neural responses in higher visual cortex.Proceedings of the national academy of sciences, 111(23):8619–8624, 2014
2014
-
[10]
Using goal-driven deep learning models to understand sensory cortex.Nature neuroscience, 19(3):356–365, 2016
Daniel LK Yamins and James J DiCarlo. Using goal-driven deep learning models to understand sensory cortex.Nature neuroscience, 19(3):356–365, 2016
2016
-
[11]
Deep supervised, but not unsupervised, models may explain it cortical representation.PLoS computational biology, 10(11):e1003915, 2014
Seyed-Mahdi Khaligh-Razavi and Nikolaus Kriegeskorte. Deep supervised, but not unsupervised, models may explain it cortical representation.PLoS computational biology, 10(11):e1003915, 2014
2014
-
[12]
Brain-score: Which artificial neural network for object recognition is most brain-like?BioRxiv, page 407007, 2018
Martin Schrimpf, Jonas Kubilius, Ha Hong, Najib J Majaj, Rishi Rajalingham, Elias B Issa, Kohitij Kar, Pouya Bashivan, Jonathan Prescott-Roy, Franziska Geiger, et al. Brain-score: Which artificial neural network for object recognition is most brain-like?BioRxiv, page 407007, 2018
2018
-
[13]
The neural architecture of language: Integra- tive modeling converges on predictive processing.Proceedings of the National Academy of Sciences, 118(45):e2105646118, 2021
Martin Schrimpf, Idan Asher Blank, Greta Tuckute, Carina Kauf, Eghbal A Hosseini, Nancy Kan- wisher, Joshua B Tenenbaum, and Evelina Fedorenko. The neural architecture of language: Integra- tive modeling converges on predictive processing.Proceedings of the National Academy of Sciences, 118(45):e2105646118, 2021. 10
2021
-
[14]
Neural population control via deep image synthesis
Pouya Bashivan, Kohitij Kar, and James J DiCarlo. Neural population control via deep image synthesis. Science, 364(6439):eaav9436, 2019
2019
-
[15]
Evolving images for visual neurons using a deep generative network reveals coding principles and neuronal preferences.Cell, 177(4):999–1009, 2019
Carlos R Ponce, Will Xiao, Peter F Schade, Till S Hartmann, Gabriel Kreiman, and Margaret S Livingstone. Evolving images for visual neurons using a deep generative network reveals coding principles and neuronal preferences.Cell, 177(4):999–1009, 2019
2019
-
[16]
Deep convolutional models improve predictions of macaque v1 responses to natural images.PLoS computational biology, 15(4):e1006897, 2019
Santiago A Cadena, George H Denfield, Edgar Y Walker, Leon A Gatys, Andreas S Tolias, Matthias Bethge, and Alexander S Ecker. Deep convolutional models improve predictions of macaque v1 responses to natural images.PLoS computational biology, 15(4):e1006897, 2019
2019
-
[17]
A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy.Neuron, 98(3):630–644, 2018
Alexander JE Kell, Daniel LK Yamins, Erica N Shook, Sam V Norman-Haignere, and Josh H McDermott. A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy.Neuron, 98(3):630–644, 2018
2018
-
[18]
Dimensionality reduction for large-scale neural recordings.Nature neuroscience, 17(11):1500–1509, 2014
John P Cunningham and Byron M Yu. Dimensionality reduction for large-scale neural recordings.Nature neuroscience, 17(11):1500–1509, 2014
2014
-
[19]
Interpreting encoding and decoding models.Current opinion in neurobiology, 55:167–179, 2019
Nikolaus Kriegeskorte and Pamela K Douglas. Interpreting encoding and decoding models.Current opinion in neurobiology, 55:167–179, 2019
2019
-
[20]
Learnable latent embeddings for joint behavioural and neural analysis.Nature, 617(7960):360–368, 2023
Steffen Schneider, Jin Hwa Lee, and Mackenzie Weygandt Mathis. Learnable latent embeddings for joint behavioural and neural analysis.Nature, 617(7960):360–368, 2023
2023
-
[21]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021
2021
-
[22]
Scaling up visual and vision-language representation learning with noisy text supervision
Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc Le, Yun-Hsuan Sung, Zhen Li, and Tom Duerig. Scaling up visual and vision-language representation learning with noisy text supervision. InInternational conference on machine learning, pages 4904–4916. PMLR, 2021
2021
-
[23]
Sigmoid loss for language image pre-training
Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, and Lucas Beyer. Sigmoid loss for language image pre-training. InProceedings of the IEEE/CVF international conference on computer vision, pages 11975–11986, 2023
2023
-
[24]
Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models
Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. InInternational conference on machine learning, pages 19730–19742. PMLR, 2023
2023
-
[25]
Flamingo: a visual language model for few-shot learning.Advances in neural information processing systems, 35:23716–23736, 2022
Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katherine Millican, Malcolm Reynolds, et al. Flamingo: a visual language model for few-shot learning.Advances in neural information processing systems, 35:23716–23736, 2022
2022
-
[26]
Improved baselines with visual instruction tuning
Haotian Liu, Chunyuan Li, Yuheng Li, and Yong Jae Lee. Improved baselines with visual instruction tuning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 26296–26306, 2024
2024
-
[27]
Steering clip’s vision transformer with sparse autoencoders.arXiv preprint arXiv:2504.08729, 2025
Sonia Joseph, Praneet Suresh, Ethan Goldfarb, Lorenz Hufe, Yossi Gandelsman, Robert Graham, Danilo Bzdok, Wojciech Samek, and Blake Aaron Richards. Steering clip’s vision transformer with sparse autoencoders.arXiv preprint arXiv:2504.08729, 2025
arXiv 2025
-
[28]
Towards monosemanticity: Decomposing language models with dictionary learning.Transformer Circuits Thread,
Trenton Bricken, Adly Templeton, Joshua Batson, Brian Chen, Adam Jermyn, Tom Conerly, Nick Turner, Cem Anil, Carson Denison, Amanda Askell, Robert Lasenby, Yifan Wu, Shauna Kravec, Nicholas Schiefer, Tim Maxwell, Nicholas Joseph, Zac Hatfield-Dodds, Alex Tamkin, Karina Nguyen, Brayden McLean, Josiah E Burke, Tristan Hume, Shan Carter, Tom Henighan, and Ch...
-
[29]
https://transformer-circuits.pub/2023/monosemantic-features/index.html
2023
-
[30]
Thomas Fel, Ekdeep Singh Lubana, Jacob S Prince, Matthew Kowal, Victor Boutin, Isabel Papadimitriou, Binxu Wang, Martin Wattenberg, Demba Ba, and Talia Konkle. Archetypal sae: Adaptive and stable dictionary learning for concept extraction in large vision models.arXiv preprint arXiv:2502.12892, 2025
arXiv 2025
-
[31]
Survey of spiking in the mouse visual system reveals functional hierarchy.Nature, 592(7852):86–92, 2021
Joshua H Siegle, Xiaoxuan Jia, Séverine Durand, Sam Gale, Corbett Bennett, Nile Graddis, Greggory Heller, Tamina K Ramirez, Hannah Choi, Jennifer A Luviano, et al. Survey of spiking in the mouse visual system reveals functional hierarchy.Nature, 592(7852):86–92, 2021. 11
2021
-
[32]
Neuropixels 2.0: A miniaturized high- density probe for stable, long-term brain recordings.Science, 372(6539):eabf4588, 2021
Nicholas A Steinmetz, Cagatay Aydin, Anna Lebedeva, Michael Okun, Marius Pachitariu, Marius Bauza, Maxime Beau, Jai Bhagat, Claudia Böhm, Martijn Broux, et al. Neuropixels 2.0: A miniaturized high- density probe for stable, long-term brain recordings.Science, 372(6539):eabf4588, 2021
2021
-
[33]
Yulong Liu, Yongqiang Ma, Wei Zhou, Guibo Zhu, and Nanning Zheng. Brainclip: Bridging brain and visual-linguistic representation via clip for generic natural visual stimulus decoding.arXiv preprint arXiv:2302.12971, 2023
arXiv 2023
-
[34]
A large-scale examination of inductive biases shaping high-level visual representation in brains and machines.Nature communications, 15(1):9383, 2024
Colin Conwell, Jacob S Prince, Kendrick N Kay, George A Alvarez, and Talia Konkle. A large-scale examination of inductive biases shaping high-level visual representation in brains and machines.Nature communications, 15(1):9383, 2024
2024
-
[35]
High-level visual representations in the human brain are aligned with large language models
Adrien Doerig, Tim C Kietzmann, Emily Allen, Yihan Wu, Thomas Naselaris, Kendrick Kay, and Ian Charest. High-level visual representations in the human brain are aligned with large language models. Nature Machine Intelligence, 7(8):1220–1234, 2025
2025
-
[36]
Natural speech reveals the semantic maps that tile human cerebral cortex.Nature, 532(7600):453–458, 2016
Alexander G Huth, Wendy A De Heer, Thomas L Griffiths, Frédéric E Theunissen, and Jack L Gallant. Natural speech reveals the semantic maps that tile human cerebral cortex.Nature, 532(7600):453–458, 2016
2016
-
[37]
Evidence of a predictive coding hierarchy in the human brain listening to speech.Nature human behaviour, 7(3):430–441, 2023
Charlotte Caucheteux, Alexandre Gramfort, and Jean-Rémi King. Evidence of a predictive coding hierarchy in the human brain listening to speech.Nature human behaviour, 7(3):430–441, 2023
2023
-
[38]
Brain encoding models based on multi- modal transformers can transfer across language and vision.Advances in neural information processing systems, 36:29654–29666, 2023
Jerry Tang, Meng Du, Vy V o, Vasudev Lal, and Alexander Huth. Brain encoding models based on multi- modal transformers can transfer across language and vision.Advances in neural information processing systems, 36:29654–29666, 2023
2023
-
[39]
High-resolution image reconstruction with latent diffusion models from human brain activity
Yu Takagi and Shinji Nishimoto. High-resolution image reconstruction with latent diffusion models from human brain activity. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14453–14463, 2023
2023
-
[40]
Paul S Scotti, Mihir Tripathy, Cesar Kadir Torrico Villanueva, Reese Kneeland, Tong Chen, Ashutosh Narang, Charan Santhirasegaran, Jonathan Xu, Thomas Naselaris, Kenneth A Norman, et al. Mindeye2: Shared-subject models enable fmri-to-image with 1 hour of data.arXiv preprint arXiv:2403.11207, 2024
arXiv 2024
-
[41]
Multiscale voxel based decoding for enhanced natural image reconstruction from brain activity
Mali Halac, Murat Isik, Hasan Ayaz, and Anup Das. Multiscale voxel based decoding for enhanced natural image reconstruction from brain activity. In2022 International Joint Conference on Neural Networks (IJCNN), pages 1–7. IEEE, 2022
2022
-
[42]
Semantic reconstruction of continuous language from non-invasive brain recordings.Nature Neuroscience, pages 1–9, 2023
Jerry Tang, Amanda LeBel, Shailee Jain, and Alexander G Huth. Semantic reconstruction of continuous language from non-invasive brain recordings.Nature Neuroscience, pages 1–9, 2023
2023
-
[43]
Decoding speech from non-invasive brain recordings.arXiv preprint arXiv:2208.12266, 2022
Alexandre Défossez, Charlotte Caucheteux, Jérémy Rapin, Ori Kabeli, and Jean-Rémi King. Decoding speech from non-invasive brain recordings.arXiv preprint arXiv:2208.12266, 2022
arXiv 2022
-
[44]
Tarr, and Leila Wehbe
Andrew Luo, Margaret Marie Henderson, Michael J. Tarr, and Leila Wehbe. BrainSCUBA: Fine-grained natural language captions of visual cortex selectivity. InThe Twelfth International Conference on Learning Representations, 2024
2024
-
[45]
Interplm: discovering interpretable features in protein language models via sparse autoencoders.Nature methods, 22(10):2107–2117, 2025
Elana Simon and James Zou. Interplm: discovering interpretable features in protein language models via sparse autoencoders.Nature methods, 22(10):2107–2117, 2025
2025
-
[46]
Sparse autoencoders uncover biologically interpretable features in protein language model representations.Proceedings of the National Academy of Sciences, 122(34):e2506316122, 2025
Onkar Gujral, Mihir Bafna, Eric Alm, and Bonnie Berger. Sparse autoencoders uncover biologically interpretable features in protein language model representations.Proceedings of the National Academy of Sciences, 122(34):e2506316122, 2025
2025
-
[47]
Laurence Freeman, Philip Shamash, Vinam Arora, Caswell Barry, Tiago Branco, and Eva Dyer. Be- yond black boxes: Enhancing interpretability of transformers trained on neural data.arXiv preprint arXiv:2506.14014, 2025
arXiv 2025
-
[48]
Neural models for detection and classification of brain states and transitions.Communica- tions Biology, 8(1):599, 2025
Arnau Marin-Llobet, Arnau Manasanch, Leonardo Dalla Porta, Melody Torao-Angosto, and Maria V Sanchez-Vives. Neural models for detection and classification of brain states and transitions.Communica- tions Biology, 8(1):599, 2025
2025
-
[49]
Physmap-interpretable in vivo neuronal cell type identification using multi-modal analysis of electrophysiological data.BioRxiv, pages 2024–02, 2024
Eric Kenji Lee, Asım Emre Gül, Greggory Heller, Anna Lakunina, Santiago Jaramillo, Pawel F Przytycki, and Chandramouli Chandrasekaran. Physmap-interpretable in vivo neuronal cell type identification using multi-modal analysis of electrophysiological data.BioRxiv, pages 2024–02, 2024. 12
2024
-
[50]
A deep learning strategy to identify cell types across species from high-density extracellular recordings.Cell, 188(8):2218–2234, 2025
Maxime Beau, David J Herzfeld, Francisco Naveros, Marie E Hemelt, Federico D’Agostino, Marlies Oostland, Alvaro Sánchez-López, Young Yoon Chung, Michael Maibach, Stephen Kyranakis, et al. A deep learning strategy to identify cell types across species from high-density extracellular recordings.Cell, 188(8):2218–2234, 2025
2025
-
[51]
In vivo cell-type and brain region classification via multimodal contrastive learning.bioRxiv, pages 2024–11, 2025
Han Yu, Hanrui Lyu, Ethan Yixun Xu, Charlie Windolf, Eric Kenji Lee, Fan Yang, Andrew M Shelton, Shawn Olsen, Sahar Minavi, Olivier Winter, et al. In vivo cell-type and brain region classification via multimodal contrastive learning.bioRxiv, pages 2024–11, 2025
2024
-
[52]
An ai agent for cell-type specific brain computer interfaces
Arnau Marin-Llobet, Zuwan Lin, Jongmin Baek, Almir Aljovic, Xinhe Zhang, Ariel J Lee, Wenbo Wang, Jaeyong Lee, Hao Shen, Yichun He, et al. An ai agent for cell-type specific brain computer interfaces. bioRxiv, 2025
2025
-
[53]
Llama 2: Open foundation and fine-tuned chat models.arXiv preprint arXiv:2307.09288, 2023
Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models.arXiv preprint arXiv:2307.09288, 2023
Pith/arXiv arXiv 2023
-
[54]
Sentence-bert: Sentence embeddings using siamese bert-networks
Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks. ArXiv, abs/1908.10084, 2019
Pith/arXiv arXiv 1908
-
[55]
Imagenet large scale visual recognition challenge
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3):211–252, 2015
2015
-
[56]
A cortical circuit for gain control by behavioral state.Cell, 156(6):1139–1152, 2014
Yu Fu, Jason M Tucciarone, J Sebastian Espinosa, Nengyin Sheng, Daniel P Darcy, Roger A Nicoll, Z Josh Huang, and Michael P Stryker. A cortical circuit for gain control by behavioral state.Cell, 156(6):1139–1152, 2014
2014
-
[57]
Cortical interneurons that specialize in disinhibitory control.Nature, 503(7477):521–524, 2013
Hyun-Jae Pi, Balázs Hangya, Duda Kvitsiani, Joshua I Sanders, Z Josh Huang, and Adam Kepecs. Cortical interneurons that specialize in disinhibitory control.Nature, 503(7477):521–524, 2013
2013
-
[58]
Hoagy Cunningham, Aidan Ewart, Logan Riggs, Robert Huben, and Lee Sharkey. Sparse autoencoders find highly interpretable features in language models.arXiv preprint arXiv:2309.08600, 2023
Pith/arXiv arXiv 2023
-
[59]
Scaling and evaluating sparse autoencoders.arXiv preprint arXiv:2406.04093, 2024
Leo Gao, Tom Dupré la Tour, Henk Tillman, Gabriel Goh, Rajan Troll, Alec Radford, Ilya Sutskever, Jan Leike, and Jeffrey Wu. Scaling and evaluating sparse autoencoders.arXiv preprint arXiv:2406.04093, 2024
Pith/arXiv arXiv 2024
-
[60]
An odor is not worth a thousand words: from multidimensional odors to unidimensional odor objects.Annual review of psychology, 61(1):219–241, 2010
Yaara Yeshurun and Noam Sobel. An odor is not worth a thousand words: from multidimensional odors to unidimensional odor objects.Annual review of psychology, 61(1):219–241, 2010
2010
-
[61]
USER: <image>\n Describe this scene in one sentence.\n ASSISTANT:
Chuan Qin, Constantin Venhoff, Sonia Joseph, Fanyi Xiao, and Stefan Scherer. Sparse clip: Co-optimizing interpretability and performance in contrastive learning.ArXiv, abs/2601.20075, 2026. 13 A Appendix This appendix collects supporting material referenced from the main text. Section A.2 provides full architectural and training-configuration details for ...
arXiv 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.