Fine-tuning language encoding models on slow fMRI improves prediction for fast ECoG

Aditya R. Vaidya; Alexander G. Huth; Richard J. Antonello

arxiv: 2605.19224 · v1 · pith:4IPBDI4Unew · submitted 2026-05-19 · 💻 cs.CL

Fine-tuning language encoding models on slow fMRI improves prediction for fast ECoG

Aditya R. Vaidya , Richard J. Antonello , Alexander G. Huth This is my paper

Pith reviewed 2026-05-20 06:44 UTC · model grok-4.3

classification 💻 cs.CL

keywords fMRIECoGlanguage encoding modelsfine-tuningspoken languagebrain signal predictionneural representations

0 comments

The pith

Fine-tuning language models on fMRI data improves predictions for ECoG recordings

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that language representations tuned on fMRI brain scans can be transferred to build better encoding models for ECoG signals. A sympathetic reader would care because ECoG offers high temporal resolution but is restricted to small numbers of patients with implants, while fMRI data is far more abundant and noninvasive. The authors show that these fMRI-tuned models improve ECoG predictions even in fast frequency bands not directly captured by fMRI, that the gains persist when fMRI is artificially slowed by downsampling, and that ECoG performance increases steadily as more fMRI tuning data is added.

Core claim

Using spoken language representations fine-tuned on fMRI, we build encoding models of ECoG. These representations showed improved prediction performance in ECoG, even though the temporal resolution of fMRI is two orders of magnitude worse. Prediction improved in frequency bands well beyond what is directly measured in fMRI. Next, to test the procedure's generalization ability, we fine-tuned models on fMRI responses that were temporally downsampled by a factor of 2. Despite the loss in resolution, these models were able to predict fMRI and ECoG responses at levels comparable to the original fMRI-tuned models. Finally, we showed that ECoG performance steadily scales with the amount of fMRI-tun

What carries the argument

Spoken language encoding models fine-tuned on fMRI responses and then applied to predict ECoG signals

If this is right

ECoG encoding models gain accuracy from fMRI fine-tuning even for signal components faster than fMRI measures.
The transfer remains effective after fMRI data is downsampled by a factor of two in time.
ECoG prediction performance increases as the amount of fMRI tuning data grows.
Combining slow and fast recording methods may support improved brain decoding applications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Large public fMRI datasets could be repurposed to improve models for scarce high-resolution modalities like ECoG.
The same fine-tuning strategy might transfer to other pairs of slow and fast brain recording techniques.
If the scaling relationship holds, targeted collection of fMRI data for tuning could become routine for ECoG language studies.

Load-bearing premise

Fine-tuning on fMRI data produces representations that generalize to ECoG without overfitting to the slower temporal structure or introducing modality-specific biases.

What would settle it

A controlled test showing that non-fine-tuned models match or exceed the prediction accuracy of fMRI-tuned models on held-out ECoG data would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.19224 by Aditya R. Vaidya, Alexander G. Huth, Richard J. Antonello.

**Figure 1.** Figure 1: fMRI-to-ECoG transfer via fMRI-tuning. We fine-tune the 9th layer of a deep speech representation model, WavLM Base+ (Chen et al., 2021), to predict fMRI responses (measured at 0.5 Hz) to spoken language. We then freeze the weights of the WavLM model and use its representations to build linearized encoding models of ECoG responses (measured at 20 Hz) to speech from a separate dataset. Successfully performi… view at source ↗

**Figure 2.** Figure 2: fMRI-tuned models for predicting ECoG. (a) ECoG encoding performance averaged across all electrodes within regions of interest (ROIs). Error bars show the standard error of the mean (SEM) across electrodes. The Destrieux atlas (Destrieux et al., 2010) was used to find electrodes in each ROI: pre-frontal cortex (PFC), primary auditory cortex (AC), and the language network as described in Lipkin et al. (2022… view at source ↗

**Figure 3.** Figure 3: Frequency-binned improvement of fMRI-tuned models. Change in the power spectral density (PSD) of the ECoG encoding model residual after fMRI-tuning (lower is better). The shaded area shows the standard error across electrodes. The dotted green line at 0.25 Hz shows the Nyquist frequency of the fMRI responses. fMRI-tuning improves model fit (reduces the residual power) overall, with improvement both below a… view at source ↗

**Figure 4.** Figure 4: Fine-tuning on downsampled fMRI responses. (a) Flattened cortical surface of fMRI subject S3 that compares the encoding performance of models fine-tuned on original fMRI responses (blue) and downsampled fMRI responses (red). Voxel brightness is proportional to overall model performance. The 2-dimensional histogram shows that voxels have similar performance across fine-tuning conditions. We show subjects S1… view at source ↗

**Figure 5.** Figure 5: Scaling of fMRI-tuning for ECoG. (a) ECoG encoding performance as a function of the number of fMRI fine-tuning stories. Error bars indicate the standard error over bootstraps of the fMRI stories. We show the scaling performance with the full fMRI-tuning dataset in Appendix D. (b) Pretrained encoding performance vs. scaling coefficient me for all electrodes. The scaling law is measured per electrode as the … view at source ↗

**Figure 6.** Figure 6: Improvement in encoding performance for all electrodes (Figure 2c), visualized [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: Flattened cortical surfaces of subjects S1 and S2 from LeBel et al. (2023) (LeBel [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: Within-subject fMRI encoding performance scales with the size of the fine-tuning [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 9.** Figure 9: ECoG encoding performance as a function of the number of fMRI fine-tuning [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗

read the original abstract

Neuroscientists have recently turned to intracranial brain recording methods, like electrocorticography (ECoG), for human experiments because of the fine spatial and temporal resolution that they afford. Models trained on this data, however, are fundamentally restricted by the patient populations that can receive the implants necessary for recording. We propose using non-invasive fMRI to bridge the gap in training data. Using spoken language representations fine-tuned on fMRI, we build encoding models of ECoG. These representations showed improved prediction performance in ECoG, even though the temporal resolution of fMRI is two orders of magnitude worse. Prediction improved in frequency bands well beyond what is directly measured in fMRI. Next, to test the procedure's generalization ability, we fine-tuned models on fMRI responses that were temporally downsampled by a factor of 2. Despite the loss in resolution, these models were able to predict fMRI and ECoG responses at levels comparable to the original fMRI-tuned models. Finally, we showed that ECoG performance steadily scales with the amount of fMRI-tuning data. Our results show that "slow" data like fMRI can be a valuable resource for building better models of "fast" brain data like ECoG. In the future, integrating across multiple recording methods may further improve performance in other applications, like decoding.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

fMRI fine-tuning boosts ECoG encoding with downsampling and scaling controls that address the main temporal mismatch worry.

read the letter

The one thing to know is that this work shows fMRI fine-tuning can boost ECoG encoding performance for language, with tests indicating the benefit isn't limited to slow signals. The paper takes language representations and fine-tunes them using fMRI data from spoken language tasks. These tuned models then predict ECoG responses better than untuned ones, and the gains appear in frequency bands above what fMRI measures. To check if the slow fMRI is causing issues, they downsample the fMRI responses by a factor of two and find comparable results for both modalities. They also demonstrate that ECoG prediction accuracy increases steadily as more fMRI data is used for fine-tuning. This addresses data scarcity in ECoG by leveraging larger fMRI datasets. What the paper does well is include these targeted controls for the temporal resolution mismatch. The downsampling test and the scaling behavior provide direct evidence against simple overfitting to hemodynamics. The approach is practical and extends prior encoding model work to a cross-modal setting. The soft spots are not major. The provided abstract lacks specific quantitative metrics like correlation coefficients or statistical significance levels, which makes it harder to assess the practical importance of the improvements. If the full paper includes those details and proper controls for multiple comparisons, it would be stronger. There's also the open question of whether modality-specific biases remain, though the tests mitigate this concern to some degree. No signs of circular reasoning or unsupported claims in the summary. This paper would interest researchers building neural encoding models or working on brain decoding applications. Readers focused on integrating different brain recording techniques or scaling up training data would get the most out of it. It seems like a useful incremental step rather than a complete shift in the field. I would recommend putting it through peer review. The central argument is grounded enough and the controls are relevant, so referees could help refine the quantitative presentation and any remaining generalization questions.

Referee Report

1 major / 2 minor

Summary. The manuscript claims that fine-tuning language encoding models on fMRI data improves prediction performance for ECoG responses, despite fMRI's temporal resolution being two orders of magnitude slower. Evidence includes better ECoG predictions in high-frequency bands, comparable performance after temporally downsampling fMRI by a factor of 2, and monotonic scaling of ECoG performance with increasing volumes of fMRI tuning data.

Significance. If the central results hold, the work shows that abundant slow non-invasive data can enhance models for scarce fast invasive recordings by capturing shared semantic structure rather than modality-specific timing. The explicit downsampling and data-volume scaling controls directly test and mitigate concerns about overfitting to hemodynamics, strengthening the generalization claim. This approach could enable better multi-modal integration in language neuroscience and brain-computer interface applications.

major comments (1)

Abstract and Results sections: the reported ECoG performance improvements lack specific quantitative metrics (e.g., Pearson r or R² values with confidence intervals), statistical tests, and baseline comparisons, which are required to evaluate whether the gains are practically meaningful and exceed what would be expected from random variation.

minor comments (2)

Methods: provide the exact language model architecture, fine-tuning hyperparameters, and how ECoG frequency bands were aligned to the fMRI-tuned representations.
Figure captions and Results: clarify the number of subjects, cross-validation scheme, and whether the downsampling test preserved the same fMRI voxels or used a matched subset.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review and the recommendation for minor revision. We address the major comment below and will strengthen the quantitative reporting in the revised manuscript.

read point-by-point responses

Referee: Abstract and Results sections: the reported ECoG performance improvements lack specific quantitative metrics (e.g., Pearson r or R² values with confidence intervals), statistical tests, and baseline comparisons, which are required to evaluate whether the gains are practically meaningful and exceed what would be expected from random variation.

Authors: We agree that explicit quantitative metrics, statistical tests, and baseline comparisons are necessary for rigorous evaluation. In the revised manuscript we will add Pearson r values with 95% confidence intervals for the ECoG prediction improvements in both the Abstract and Results sections. We will also report the outcomes of appropriate statistical tests (e.g., paired permutation tests) comparing the fMRI-fine-tuned models against non-fine-tuned baselines and will include these baseline results directly in the text and figures to demonstrate that the observed gains exceed what would be expected from random variation. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's derivation chain is self-contained: language representations are fine-tuned on independent fMRI datasets and then used to build encoding models evaluated on separate ECoG recordings from different subjects and modalities. Explicit controls (temporal downsampling of fMRI by a factor of 2 yielding comparable performance, and monotonic scaling of ECoG prediction with fMRI tuning data volume) directly probe and rule out overfitting to slow hemodynamics or modality-specific artifacts. No step reduces a prediction to a fitted input by construction, invokes a self-citation as a uniqueness theorem, or renames a known result; the central claim of cross-modal improvement rests on empirical generalization tests rather than definitional equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on abstract; no explicit free parameters, axioms, or invented entities are detailed. The approach assumes standard transfer learning validity between recording modalities.

pith-pipeline@v0.9.0 · 5780 in / 939 out tokens · 35389 ms · 2026-05-20T06:44:13.839998+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 1 internal anchor

[1]

Brains and algorithms partially converge in natural language processing , volume =

ISSN 2399-3642. doi: 10.1038/s42003-022-03036-1. Edward F. Chang. Towards Large-Scale, Human-Based, Mesoscopic Neurotechnologies. Neuron, 86(1):68–78, April

work page doi:10.1038/s42003-022-03036-1
[2]

doi: 10.1016/j.neuron.2015.03.037

ISSN 0896-6273. doi: 10.1016/j.neuron.2015.03.037. Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, and Furu Wei. WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing.arXiv:2...

work page doi:10.1016/j.neuron.2015.03.037 2015
[3]

doi: 10.1016/j.neuroimage.2010.06.010

ISSN 1053-8119. doi: 10.1016/j.neuroimage.2010.06.010. Abdulkadir Gokce and Martin Schrimpf. Scaling Laws for Task-Optimized Models of the Primate Visual Ventral Stream. InForty-Second International Conference on Machine Learning, June

work page doi:10.1016/j.neuroimage.2010.06.010 2010
[4]

doi: 10.1038/s41562-025-02105-9

ISSN 2397-3374. doi: 10.1038/s41562-025-02105-9. Liberty S. Hamilton, Erik Edwards, and Edward F. Chang. A Spatial Map of Onset and Sustained Responses to Speech in the Human Superior Temporal Gyrus.Current Biology, 28(12):1860–1871.e4, June

work page doi:10.1038/s41562-025-02105-9
[5]

doi: 10.1016/j.cub.2018.04.033

ISSN 0960-9822. doi: 10.1016/j.cub.2018.04.033. Anne Hsu, Alexander Borst, and Frédéric E Theunissen. Quantifying variability in neural responses and its application for the validation of model predictions.Network: Computation in Neural Systems, 15(2):91–109, January

work page doi:10.1016/j.cub.2018.04.033 2018
[6]

doi: 10.1088/ 0954-898X_15_2_002

ISSN 0954-898X, 1361-6536. doi: 10.1088/ 0954-898X_15_2_002. Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhut- dinov, and Abdelrahman Mohamed. HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units.arXiv:2106.07447 [cs, eess], June

work page arXiv
[7]

Patrick W

10 Preprint. Patrick W. Hullett, Liberty S. Hamilton, Nima Mesgarani, Christoph E. Schreiner, and Edward F. Chang. Human Superior Temporal Gyrus Organization of Spectrotemporal Modulation Tuning Derived from Speech Stimuli.Journal of Neuroscience, 36(6):2014–2026, February

work page 2014
[8]

doi: 10.1523/JNEUROSCI.1779-15.2016

ISSN 0270-6474, 1529-2401. doi: 10.1523/JNEUROSCI.1779-15.2016. Shailee Jain and Alexander Huth. Incorporating Context into Language Encoding Models for fMRI. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (eds.),Advances in Neural Information Processing Systems 31, pp. 6628–6637. Curran Associates, Inc.,

work page doi:10.1523/jneurosci.1779-15.2016 2016
[9]

doi: 10.1016/j.neuron.2018.03.044

ISSN 0896-6273. doi: 10.1016/j.neuron.2018.03.044. Menoua Keshishian, Gavin Mischler, Samuel Thomas, Brian Kingsbury, Stephan Bickel, Ashesh D. Mehta, and Nima Mesgarani. Parallel hierarchical encoding of linguistic representations in the human auditory cortex and recurrent automatic speech recognition systems.Nature Machine Intelligence, 8(2):257–269, February

work page doi:10.1016/j.neuron.2018.03.044 2018
[10]

Adam: A Method for Stochastic Optimization

ISSN 2522-5839. doi: 10.1038/s42256-026-01185-0. Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. arXiv:1412.6980 [cs], January

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1038/s42256-026-01185-0
[11]

, title =

ISSN 2052-4463. doi: 10.1038/s41597-023-02437-z. Yuanning Li, Gopala K. Anumanchipalli, Abdelrahman Mohamed, Junfeng Lu, Jinsong Wu, and Edward F. Chang. Dissecting neural computations of the human auditory pathway using deep neural networks for speech, March

work page doi:10.1038/s41597-023-02437-z 2052
[12]

doi: 10.1038/ s41597-022-01645-3

ISSN 2052-4463. doi: 10.1038/ s41597-022-01645-3. Kaylo T. Littlejohn, Cheol Jun Cho, Jessie R. Liu, Alexander B. Silva, Bohan Yu, Vanessa R. Anderson, Cady M. Kurtz-Miott, Samantha Brosler, Anshul P. Kashyap, Irina P. Hallinan, Adit Shah, Adelyn Tu-Chan, Karunesh Ganguly, David A. Moses, Edward F. Chang, and Gopala K. Anumanchipalli. A streaming brain-to...

work page 2052
[13]

doi: 10.1038/s41593-025-01905-6

ISSN 1546-1726. doi: 10.1038/s41593-025-01905-6. Nikos K. Logothetis, Jon Pauls, Mark Augath, Torsten Trinath, and Axel Oeltermann. Neurophysiological investigation of the basis of the fMRI signal.Nature, 412(6843): 150–157, July

work page doi:10.1038/s41593-025-01905-6
[14]

doi: 10.1038/35084005

ISSN 1476-4687. doi: 10.1038/35084005. Jeremy R. Manning, Joshua Jacobs, Itzhak Fried, and Michael J. Kahana. Broadband Shifts in Local Field Potential Power Spectra Are Correlated with Single-Neuron Spiking in Humans.Journal of Neuroscience, 29(43):13613–13620, October

work page doi:10.1038/35084005
[15]

doi: 10.1523/JNEUROSCI.2041-09.2009

ISSN 0270-6474, 1529-2401. doi: 10.1523/JNEUROSCI.2041-09.2009. Takuya Matsuyama, Kota S Sasaki, and Shinji Nishimoto. Applicability of scaling laws to vision encoding models, August

work page doi:10.1523/jneurosci.2041-09.2009 2041
[16]

doi: 10.1016/j.neuroimage.2022.119438

ISSN 1053-8119. doi: 10.1016/j.neuroimage.2022.119438. N. Mesgarani, C. Cheung, K. Johnson, and E. F. Chang. Phonetic Feature Encoding in Human Superior Temporal Gyrus.Science, 343(6174):1006–1010, February

work page doi:10.1016/j.neuroimage.2022.119438 2022
[17]

doi: 10.1126/science.1245994

ISSN 0036-8075, 1095-9203. doi: 10.1126/science.1245994. Juliette Millet, Charlotte Caucheteux, Pierre Orhan, Yves Boubenec, Alexandre Gramfort, Ewan Dunbar, Christophe Pallier, and Jean-Remi King. Toward a realistic model of speech processing in the brain with self-supervised learning, June

work page doi:10.1126/science.1245994
[18]

Thomas Naselaris, Kendrick N

doi: 10.1126/science.1110913. Thomas Naselaris, Kendrick N. Kay, Shinji Nishimoto, and Jack L. Gallant. Encoding and decoding in fMRI.NeuroImage, 56(2):400–410, May

work page doi:10.1126/science.1110913
[19]

Kay, Shinji Nishimoto, and Jack L

ISSN 1053-8119. doi: 10.1016/j.neuroimage.2010.07.073. Anuja Negi, Subba Reddy Oota, Anwar O. Nunez-Elizalde, Manish Gupta, and Fatma Deniz. Brain-Informed Fine-Tuning for Improved Multilingual Understanding in Language Models. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, October

work page doi:10.1016/j.neuroimage.2010.07.073 2010
[20]

doi: 10.1038/s41562-021-01261-y

ISSN 2397-3374. doi: 10.1038/s41562-021-01261-y. Subba Reddy Oota, Emin Çelik, Fatma Deniz, and Mariya Toneva. Speech language models lack important brain-relevant semantics. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar (eds.),Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 8503–8528...

work page doi:10.1038/s41562-021-01261-y
[21]

doi: 10.18653/v1/2024.acl-long

Association for Computational Linguistics. doi: 10.18653/v1/2024.acl-long

work page doi:10.18653/v1/2024.acl-long 2024
[22]

doi: 10.1038/nn.4021

ISSN 1546-1726. doi: 10.1038/nn.4021. Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. Robust Speech Recognition via Large-Scale Weak Supervision,

work page doi:10.1038/nn.4021
[23]

Nature Neuroscience , year =

ISSN 1546-1726. doi: 10.1038/s41593-023-01304-9. 12 Preprint. Greta Tuckute, Jenelle Feather, Dana Boebinger, and Josh H. McDermott. Many but not all deep neural network audio models capture brain responses and exhibit correspondence between model stages and brain regions.PLOS Biology, 21(12):e3002366, December

work page doi:10.1038/s41593-023-01304-9
[24]

PLOS Biology21(2023) https://doi

ISSN 1545-7885. doi: 10.1371/journal.pbio.3002366. Aditya R. Vaidya, Shailee Jain, and Alexander Huth. Self-Supervised Models of Audio Effectively Explain Human Cortical Responses to Speech. InProceedings of the 39th International Conference on Machine Learning, pp. 21927–21944. PMLR, June

work page doi:10.1371/journal.pbio.3002366
[25]

doi: 10.1038/s41597-025-05462-2

ISSN 2052-4463. doi: 10.1038/s41597-025-05462-2. A Per-subject change in ECoG performance after fMRI-tuning In Figure 6, we visualize the per-electrode effect of fMRI-tuning separately for each subject in the “Podcast” dataset. 13 Preprint. Pretrained better fMRI-tuned better 0.1 −0.1 0.0 Figure 6: Improvement in encoding performance for all electrodes (F...

work page doi:10.1038/s41597-025-05462-2 2052

[1] [1]

Brains and algorithms partially converge in natural language processing , volume =

ISSN 2399-3642. doi: 10.1038/s42003-022-03036-1. Edward F. Chang. Towards Large-Scale, Human-Based, Mesoscopic Neurotechnologies. Neuron, 86(1):68–78, April

work page doi:10.1038/s42003-022-03036-1

[2] [2]

doi: 10.1016/j.neuron.2015.03.037

ISSN 0896-6273. doi: 10.1016/j.neuron.2015.03.037. Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, and Furu Wei. WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing.arXiv:2...

work page doi:10.1016/j.neuron.2015.03.037 2015

[3] [3]

doi: 10.1016/j.neuroimage.2010.06.010

ISSN 1053-8119. doi: 10.1016/j.neuroimage.2010.06.010. Abdulkadir Gokce and Martin Schrimpf. Scaling Laws for Task-Optimized Models of the Primate Visual Ventral Stream. InForty-Second International Conference on Machine Learning, June

work page doi:10.1016/j.neuroimage.2010.06.010 2010

[4] [4]

doi: 10.1038/s41562-025-02105-9

ISSN 2397-3374. doi: 10.1038/s41562-025-02105-9. Liberty S. Hamilton, Erik Edwards, and Edward F. Chang. A Spatial Map of Onset and Sustained Responses to Speech in the Human Superior Temporal Gyrus.Current Biology, 28(12):1860–1871.e4, June

work page doi:10.1038/s41562-025-02105-9

[5] [5]

doi: 10.1016/j.cub.2018.04.033

ISSN 0960-9822. doi: 10.1016/j.cub.2018.04.033. Anne Hsu, Alexander Borst, and Frédéric E Theunissen. Quantifying variability in neural responses and its application for the validation of model predictions.Network: Computation in Neural Systems, 15(2):91–109, January

work page doi:10.1016/j.cub.2018.04.033 2018

[6] [6]

doi: 10.1088/ 0954-898X_15_2_002

ISSN 0954-898X, 1361-6536. doi: 10.1088/ 0954-898X_15_2_002. Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhut- dinov, and Abdelrahman Mohamed. HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units.arXiv:2106.07447 [cs, eess], June

work page arXiv

[7] [7]

Patrick W

10 Preprint. Patrick W. Hullett, Liberty S. Hamilton, Nima Mesgarani, Christoph E. Schreiner, and Edward F. Chang. Human Superior Temporal Gyrus Organization of Spectrotemporal Modulation Tuning Derived from Speech Stimuli.Journal of Neuroscience, 36(6):2014–2026, February

work page 2014

[8] [8]

doi: 10.1523/JNEUROSCI.1779-15.2016

ISSN 0270-6474, 1529-2401. doi: 10.1523/JNEUROSCI.1779-15.2016. Shailee Jain and Alexander Huth. Incorporating Context into Language Encoding Models for fMRI. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (eds.),Advances in Neural Information Processing Systems 31, pp. 6628–6637. Curran Associates, Inc.,

work page doi:10.1523/jneurosci.1779-15.2016 2016

[9] [9]

doi: 10.1016/j.neuron.2018.03.044

ISSN 0896-6273. doi: 10.1016/j.neuron.2018.03.044. Menoua Keshishian, Gavin Mischler, Samuel Thomas, Brian Kingsbury, Stephan Bickel, Ashesh D. Mehta, and Nima Mesgarani. Parallel hierarchical encoding of linguistic representations in the human auditory cortex and recurrent automatic speech recognition systems.Nature Machine Intelligence, 8(2):257–269, February

work page doi:10.1016/j.neuron.2018.03.044 2018

[10] [10]

Adam: A Method for Stochastic Optimization

ISSN 2522-5839. doi: 10.1038/s42256-026-01185-0. Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. arXiv:1412.6980 [cs], January

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1038/s42256-026-01185-0

[11] [11]

, title =

ISSN 2052-4463. doi: 10.1038/s41597-023-02437-z. Yuanning Li, Gopala K. Anumanchipalli, Abdelrahman Mohamed, Junfeng Lu, Jinsong Wu, and Edward F. Chang. Dissecting neural computations of the human auditory pathway using deep neural networks for speech, March

work page doi:10.1038/s41597-023-02437-z 2052

[12] [12]

doi: 10.1038/ s41597-022-01645-3

ISSN 2052-4463. doi: 10.1038/ s41597-022-01645-3. Kaylo T. Littlejohn, Cheol Jun Cho, Jessie R. Liu, Alexander B. Silva, Bohan Yu, Vanessa R. Anderson, Cady M. Kurtz-Miott, Samantha Brosler, Anshul P. Kashyap, Irina P. Hallinan, Adit Shah, Adelyn Tu-Chan, Karunesh Ganguly, David A. Moses, Edward F. Chang, and Gopala K. Anumanchipalli. A streaming brain-to...

work page 2052

[13] [13]

doi: 10.1038/s41593-025-01905-6

ISSN 1546-1726. doi: 10.1038/s41593-025-01905-6. Nikos K. Logothetis, Jon Pauls, Mark Augath, Torsten Trinath, and Axel Oeltermann. Neurophysiological investigation of the basis of the fMRI signal.Nature, 412(6843): 150–157, July

work page doi:10.1038/s41593-025-01905-6

[14] [14]

doi: 10.1038/35084005

ISSN 1476-4687. doi: 10.1038/35084005. Jeremy R. Manning, Joshua Jacobs, Itzhak Fried, and Michael J. Kahana. Broadband Shifts in Local Field Potential Power Spectra Are Correlated with Single-Neuron Spiking in Humans.Journal of Neuroscience, 29(43):13613–13620, October

work page doi:10.1038/35084005

[15] [15]

doi: 10.1523/JNEUROSCI.2041-09.2009

ISSN 0270-6474, 1529-2401. doi: 10.1523/JNEUROSCI.2041-09.2009. Takuya Matsuyama, Kota S Sasaki, and Shinji Nishimoto. Applicability of scaling laws to vision encoding models, August

work page doi:10.1523/jneurosci.2041-09.2009 2041

[16] [16]

doi: 10.1016/j.neuroimage.2022.119438

ISSN 1053-8119. doi: 10.1016/j.neuroimage.2022.119438. N. Mesgarani, C. Cheung, K. Johnson, and E. F. Chang. Phonetic Feature Encoding in Human Superior Temporal Gyrus.Science, 343(6174):1006–1010, February

work page doi:10.1016/j.neuroimage.2022.119438 2022

[17] [17]

doi: 10.1126/science.1245994

ISSN 0036-8075, 1095-9203. doi: 10.1126/science.1245994. Juliette Millet, Charlotte Caucheteux, Pierre Orhan, Yves Boubenec, Alexandre Gramfort, Ewan Dunbar, Christophe Pallier, and Jean-Remi King. Toward a realistic model of speech processing in the brain with self-supervised learning, June

work page doi:10.1126/science.1245994

[18] [18]

Thomas Naselaris, Kendrick N

doi: 10.1126/science.1110913. Thomas Naselaris, Kendrick N. Kay, Shinji Nishimoto, and Jack L. Gallant. Encoding and decoding in fMRI.NeuroImage, 56(2):400–410, May

work page doi:10.1126/science.1110913

[19] [19]

Kay, Shinji Nishimoto, and Jack L

ISSN 1053-8119. doi: 10.1016/j.neuroimage.2010.07.073. Anuja Negi, Subba Reddy Oota, Anwar O. Nunez-Elizalde, Manish Gupta, and Fatma Deniz. Brain-Informed Fine-Tuning for Improved Multilingual Understanding in Language Models. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, October

work page doi:10.1016/j.neuroimage.2010.07.073 2010

[20] [20]

doi: 10.1038/s41562-021-01261-y

ISSN 2397-3374. doi: 10.1038/s41562-021-01261-y. Subba Reddy Oota, Emin Çelik, Fatma Deniz, and Mariya Toneva. Speech language models lack important brain-relevant semantics. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar (eds.),Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 8503–8528...

work page doi:10.1038/s41562-021-01261-y

[21] [21]

doi: 10.18653/v1/2024.acl-long

Association for Computational Linguistics. doi: 10.18653/v1/2024.acl-long

work page doi:10.18653/v1/2024.acl-long 2024

[22] [22]

doi: 10.1038/nn.4021

ISSN 1546-1726. doi: 10.1038/nn.4021. Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. Robust Speech Recognition via Large-Scale Weak Supervision,

work page doi:10.1038/nn.4021

[23] [23]

Nature Neuroscience , year =

ISSN 1546-1726. doi: 10.1038/s41593-023-01304-9. 12 Preprint. Greta Tuckute, Jenelle Feather, Dana Boebinger, and Josh H. McDermott. Many but not all deep neural network audio models capture brain responses and exhibit correspondence between model stages and brain regions.PLOS Biology, 21(12):e3002366, December

work page doi:10.1038/s41593-023-01304-9

[24] [24]

PLOS Biology21(2023) https://doi

ISSN 1545-7885. doi: 10.1371/journal.pbio.3002366. Aditya R. Vaidya, Shailee Jain, and Alexander Huth. Self-Supervised Models of Audio Effectively Explain Human Cortical Responses to Speech. InProceedings of the 39th International Conference on Machine Learning, pp. 21927–21944. PMLR, June

work page doi:10.1371/journal.pbio.3002366

[25] [25]

doi: 10.1038/s41597-025-05462-2

ISSN 2052-4463. doi: 10.1038/s41597-025-05462-2. A Per-subject change in ECoG performance after fMRI-tuning In Figure 6, we visualize the per-electrode effect of fMRI-tuning separately for each subject in the “Podcast” dataset. 13 Preprint. Pretrained better fMRI-tuned better 0.1 −0.1 0.0 Figure 6: Improvement in encoding performance for all electrodes (F...

work page doi:10.1038/s41597-025-05462-2 2052