pith. sign in

arxiv: 1906.11861 · v1 · pith:RYTXKNCInew · submitted 2019-06-27 · 💻 cs.CL

Relating Simple Sentence Representations in Deep Neural Networks and the Brain

Pith reviewed 2026-05-25 14:44 UTC · model grok-4.3

classification 💻 cs.CL
keywords BERTMEGsentence representationsbrain decodingdeep neural networksELMosynthetic brain datadata augmentation
0
0 comments X

The pith

BERT activations correlate most strongly with MEG brain recordings while people read simple sentences, and those activations can generate synthetic brain data that improves word decoding accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether sentence representations inside recurrent and transformer networks align with human brain activity recorded via MEG during reading of short, syntactically simple sentences. Among the models examined, BERT layer activations show the highest correlation with the MEG signals. The same network representations are then used to synthesize plausible brain responses for new sentences; adding this synthetic data to real recordings measurably raises accuracy on a downstream task that decodes which word a subject is reading from the MEG trace. The work also reports that a single MEG response to one word carries detectable information about earlier words in the same sentence.

Core claim

BERT activations provide the strongest correlation with MEG brain data collected during sentence reading. Representations from deep networks can be used to synthesize brain activity for new sentences, augmenting existing datasets and improving performance on stimuli decoding tasks. MEG recordings of a word can distinguish earlier words in the sentence.

What carries the argument

Layer-wise correlation between model hidden states and MEG time series, followed by a regression model that maps network activations to synthetic MEG vectors for data augmentation.

If this is right

  • Model representations can be treated as predictors of brain responses to previously unseen sentences.
  • Synthetic MEG traces generated from text-only models can expand small brain-recording corpora for better statistical power.
  • The same mapping supplies a quantitative test for whether particular layers or architectures capture aspects of incremental sentence processing observed in the brain.
  • MEG signals at one time point can be decoded for information about preceding words, showing that brain activity retains sentence context.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the alignment holds, language models could be used to simulate expected brain responses for stimulus design in future experiments.
  • The approach suggests a route to test whether model-brain correspondence improves when models are trained on more brain-like objectives.
  • Extending the method to sentences with greater syntactic complexity could reveal where current models diverge from human incremental parsing.
  • Successful data augmentation implies that brain-decoding pipelines for clinical use might reduce the amount of required per-patient recording time.

Load-bearing premise

The reported correlations and decoding gains arise from a genuine match between model representations and neural activity rather than from dataset-specific artifacts or preprocessing choices.

What would settle it

Running the identical correlation and augmentation pipeline on a fresh MEG dataset collected from new subjects or sentences and finding that BERT no longer yields the highest correlation or that the synthetic data fails to raise decoding accuracy.

Figures

Figures reproduced from arXiv: 1906.11861 by Hao Tang, Partha Talukdar, Sharmistha Jat, Tom Mitchell.

Figure 1
Figure 1. Figure 1: Encoding model for MEG data. 306 channel 500ms MEG signal for a single word was compressed to [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Architecture diagram for the simple multi [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Pairwise classification accuracy of brain activity data predicted from various model layer representations. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Pairwise accuracy of various brain regions from some selected deep neural network model layers. The [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Experimental setup for micro-context tests. Given two sentences with similar words except one in the [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Average sign agreement activity for noun sensitivity stimuli ‘ [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Accuracy with and without synthetically generated MEG brain data on two stimuli prediction tasks: (a) [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Sign agreement image for verb, determiner and adjective sensitivity test stimuli. The red and blue [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Pairwise Accuracy of predicting brain encodings for noun, verb, passive & active sentences. For each [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Micro-context sensitivity test results for all the layers. The color of a cell represents the value within [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗
read the original abstract

What is the relationship between sentence representations learned by deep recurrent models against those encoded by the brain? Is there any correspondence between hidden layers of these recurrent models and brain regions when processing sentences? Can these deep models be used to synthesize brain data which can then be utilized in other extrinsic tasks? We investigate these questions using sentences with simple syntax and semantics (e.g., The bone was eaten by the dog.). We consider multiple neural network architectures, including recently proposed ELMo and BERT. We use magnetoencephalography (MEG) brain recording data collected from human subjects when they were reading these simple sentences. Overall, we find that BERT's activations correlate the best with MEG brain data. We also find that the deep network representation can be used to generate brain data from new sentences to augment existing brain data. To the best of our knowledge, this is the first work showing that the MEG brain recording when reading a word in a sentence can be used to distinguish earlier words in the sentence. Our exploration is also the first to use deep neural network representations to generate synthetic brain data and to show that it helps in improving subsequent stimuli decoding task accuracy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript investigates correspondences between sentence representations in deep neural networks (including ELMo and BERT) and human MEG brain recordings collected while subjects read simple sentences. It reports that BERT activations exhibit the strongest correlations with the MEG data and shows that DNN hidden states can be used to synthesize brain signals for new sentences, which augment the dataset and improve accuracy on a stimuli decoding task. The work also claims to be the first to demonstrate that MEG signals at a given word can distinguish earlier words in the sentence and the first to use model-generated synthetic brain data for improved decoding performance.

Significance. If the central claims survive controls for lexical confounds and proper statistical validation, the results would indicate a substantive alignment between modern transformer representations and neural activity during sentence comprehension, while also offering a practical method for augmenting scarce neuroimaging datasets. The inclusion of recent models such as BERT and the explicit demonstration of downstream utility from synthetic data constitute clear strengths.

major comments (3)
  1. [Abstract] Abstract: The claim that BERT activations 'correlate the best' with MEG data is stated without any reported correlation coefficients, p-values, multiple-comparison corrections, layer-specific breakdowns, or baseline comparisons (e.g., against word-frequency or length-matched controls), rendering it impossible to assess whether the result reflects representational alignment or surface covariates.
  2. [Abstract] Abstract: The assertion that DNN representations 'can be used to generate brain data from new sentences to augment existing brain data' and improve decoding accuracy supplies no description of the mapping procedure, training/test partitioning, regularization, or statistical test of the accuracy gain; without these, the improvement cannot be distinguished from overfitting or leakage on the small set of simple sentences.
  3. [Abstract] Abstract / Methods (implied): No controls are described for known confounds such as word frequency, sentence length, or temporal position, which are load-bearing for the claim that observed statistics arise from shared syntactic/semantic structure rather than dataset artifacts; the skeptic concern therefore remains unaddressed.
minor comments (1)
  1. [Abstract] The abstract contains several run-on sentences that reduce readability; splitting the final two sentences would improve clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We respond point-by-point to the major concerns raised about the abstract and indicate where revisions will be made.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claim that BERT activations 'correlate the best' with MEG data is stated without any reported correlation coefficients, p-values, multiple-comparison corrections, layer-specific breakdowns, or baseline comparisons (e.g., against word-frequency or length-matched controls), rendering it impossible to assess whether the result reflects representational alignment or surface covariates.

    Authors: The main text reports the full set of Pearson correlations, FDR-corrected p-values, layer-wise breakdowns for BERT/ELMo and other models, and comparisons against word-frequency and length baselines. The abstract summarizes the primary finding at a high level. We will revise the abstract to include representative correlation values and note the statistical controls. revision: yes

  2. Referee: [Abstract] Abstract: The assertion that DNN representations 'can be used to generate brain data from new sentences to augment existing brain data' and improve decoding accuracy supplies no description of the mapping procedure, training/test partitioning, regularization, or statistical test of the accuracy gain; without these, the improvement cannot be distinguished from overfitting or leakage on the small set of simple sentences.

    Authors: The Methods section details the linear (ridge) mapping, nested cross-validation for train/test partitioning on the sentence set, regularization parameter selection, and permutation tests for the decoding accuracy gain. We will add a one-sentence summary of the procedure and validation approach to the abstract. revision: yes

  3. Referee: [Abstract] Abstract / Methods (implied): No controls are described for known confounds such as word frequency, sentence length, or temporal position, which are load-bearing for the claim that observed statistics arise from shared syntactic/semantic structure rather than dataset artifacts; the skeptic concern therefore remains unaddressed.

    Authors: All stimuli are simple declarative sentences of fixed length; temporal position is explicitly modeled in the MEG analysis. Explicit word-frequency matching was not performed. We will add a Discussion paragraph on these confounds and report any post-hoc frequency-controlled analyses. revision: partial

Circularity Check

0 steps flagged

No significant circularity; claims rest on external MEG recordings

full rationale

The paper computes correlations between DNN hidden states (ELMo, BERT, etc.) and independently collected MEG brain recordings from subjects reading simple sentences. No equations, fitted parameters, or self-referential definitions appear in the provided abstract or described methodology. The synthetic brain data generation step applies a mapping to new sentences but is evaluated on extrinsic decoding tasks against held-out brain data, not by construction. No load-bearing self-citations, uniqueness theorems, or ansatzes are invoked. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical correlation study; abstract describes no mathematical derivations, free parameters, or invented entities.

pith-pipeline@v0.9.0 · 5736 in / 929 out tokens · 24225 ms · 2026-05-25T14:44:54.792242+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · 3 internal anchors

  1. [1]

    Pulkit Agrawal, Dustin Stansbury, Jitendra Malik, and Jack L. Gallant. 2014. http://arxiv.org/abs/1407.5104 Pixels to voxels: Modeling visual representation in the human brain . CoRR, abs/1407.5104

  2. [2]

    Omri Barak. 2017. https://doi.org/https://doi.org/10.1016/j.conb.2017.06.003 Recurrent neural networks as versatile tools of neuroscience research . Current Opinion in Neurobiology, 46:1 -- 6. Computational Neuroscience

  3. [3]

    Jacob Devlin, Ming - Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. https://arxiv.org/abs/1810.04805 BERT: pre-training of deep bidirectional transformers for language understanding . In Proc. of NAACL

  4. [4]

    Chris Dyer, Adhiguna Kuncoro, Miguel Ballesteros, and Noah A. Smith. 2016. Recurrent neural network grammars. In Proc. of NAACL

  5. [5]

    Gary H Glover. 2011. https://doi.org/10.1016/j.nec.2010.11.001 Overview of functional magnetic resonance imaging . Neurosurgery clinics of North America, 22(2):133--vii

  6. [6]

    Golub, Michael Heath, and Grace Wahba

    Gene H. Golub, Michael Heath, and Grace Wahba. 1979. https://doi.org/10.1080/00401706.1979.10489751 Generalized cross-validation as a method for choosing a good ridge parameter . Technometrics, 21(2):215--223

  7. [7]

    John Hale, Chris Dyer, Adhiguna Kuncoro, and Jonathan Brennan. 2018. http://aclweb.org/anthology/P18-1254 Finding syntax in human encephalography with beam search . In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2727--2736. Association for Computational Linguistics

  8. [8]

    Sepp Hochreiter and J\" u rgen Schmidhuber. 1997. https://doi.org/10.1162/neco.1997.9.8.1735 Long short-term memory . Neural Comput., 9(8):1735--1780

  9. [9]

    Matthew Honnibal and Ines Montani. 2017. spacy 2: Natural language understanding with bloom embeddings, convolutional neural networks and incremental parsing. To appear

  10. [10]

    Lounasmaa

    Matti HÀmÀlÀinen, Riitta Hari, Risto Ilmoniemi, Jukka Knuutila, and Olli V. Lounasmaa. 1993. https://doi.org/10.1103/RevModPhys.65.413 Magnetoencephalography: Theory, instrumentation, and applications to noninvasive studies of the working human brain . Rev. Mod. Phys., 65:413--

  11. [11]

    Hakan Inan, Khashayar Khosravi, and Richard Socher. 2016. http://arxiv.org/abs/1611.01462 Tying word vectors and word classifiers: A loss framework for language modeling . CoRR, abs/1611.01462

  12. [12]

    Shailee Jain and Alexander Huth. 2018. http://papers.nips.cc/paper/7897-incorporating-context-into-language-encoding-models-for-fmri.pdf Incorporating context into language encoding models for fmri . In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 66...

  13. [13]

    Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. https://doi.org/10.1038/nature14539 Deep learning . Nature, 521:436

  14. [14]

    Shenoy, and William T

    Valerio Mante, David Sussillo, Krishna V. Shenoy, and William T. Newsome. 2013. https://doi.org/10.1038/nature12742 Context-dependent computation by recurrent dynamics in prefrontal cortex . Nature, 503:78 EP --

  15. [15]

    Mitchell, Svetlana V

    Tom M. Mitchell, Svetlana V. Shinkareva, Andrew Carlson, Kai-Min Chang, Vicente L. Malave, Robert A. Mason, and Marcel Adam Just. 2008. https://doi.org/10.1126/science.1152876 Predicting human brain activity associated with the meanings of nouns . Science, 320(5880):1191--1195

  16. [16]

    Pedregosa, G

    F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python . Journal of Machine Learning Research, 12:2825--2830

  17. [17]

    Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In EMNLP, volume 14, pages 1532--1543

  18. [18]

    Gershman, Nancy Kanwisher, Matthew Botvinick, and Evelina Fedorenko

    Francisco Pereira, Bin Lou, Brianna Pritchett, Samuel Ritter, Samuel J. Gershman, Nancy Kanwisher, Matthew Botvinick, and Evelina Fedorenko. 2018. https://doi.org/10.1038/s41467-018-03068-4 Toward a universal decoder of linguistic meaning from brain activation . Nature Communications, 9(1):963

  19. [19]

    Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer

    Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proc. of NAACL

  20. [20]

    Nicole Rafidi. 2014. https://www.ml.cmu.edu/research/dap-papers/DAP_Rafidi.pdf The role of syntax in semantic processing: A study of active and passive sentences . [Online; accessed 2-March-2019]

  21. [21]

    Gustavo Sudre, Dean Pomerleau, Mark Palatucci, Leila Wehbe, Alona Fyshe, Riitta Salmelin, and Tom Mitchell. 2012. https://doi.org/10.1016/j.neuroimage.2012.04.048 Tracking neural coding of perceptual and semantic features of concrete nouns . NeuroImage, 62:451--63

  22. [22]

    Jingyuan Sun, Shaonan Wang, Jiajun Zhang, and Chengqing Zong. 2019. Towards sentence-level brain decoding with distributed representations. AAAI Press

  23. [23]

    Partha Pratim Talukdar, Derry Wijaya, and Tom Mitchell. 2012. https://doi.org/10.1145/2396761.2396886 Acquiring temporal constraints between relations . In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM '12, pages 992--1001, New York, NY, USA. ACM

  24. [24]

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ukasz Kaiser, and Illia Polosukhin. 2017. http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf Attention is all you need . In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information P...

  25. [25]

    Leila Wehbe, Brian Murphy, Partha Talukdar, Alona Fyshe, Aaditya Ramdas, and Tom Mitchell. 2014 a . https://doi.org/10.1371/journal.pone.0112575 Simultaneously uncovering the patterns of brain regions involved in different story reading subprocesses . PloS one, 9:e112575

  26. [26]

    Mitchell

    Leila Wehbe, Ashish Vaswani, Kevin Knight, and Tom M. Mitchell. 2014 b . Aligning context-based statistical models of language with brain activity during reading. In EMNLP , pages 233--243. ACL