arxiv: 2605.09187 · v1 · submitted 2026-05-09 · 💻 cs.AI · cs.CL· cs.LG

Recognition: no theorem link

Emergent Semantic Role Understanding in Language Models

Carla Griffiths , Mirco Musolesi

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:50 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.LG

keywords semantic roleslanguage modelsemergencepre-traininglinear probesmodel scalingdecoder-only transformerslinguistic structure

0 comments

The pith

Semantic role understanding emerges in language models from pre-training, shifting to distributed representations at larger scales.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors examine if semantic role labeling, which captures who did what to whom, arises naturally from the language modeling task or requires additional training. They freeze pre-trained decoder-only transformers and train linear probes to predict semantic roles from the representations. Results show that substantial role information is already encoded during pre-training, with probe performance increasing alongside model size yet remaining below that of fully fine-tuned models. This points to partial emergence of linguistic structure from unsupervised objectives alone, with the encoding becoming more distributed as scale grows.

Core claim

Semantic role structure emerges from language modeling objectives, but its internal implementation shifts toward more distributed representations as model scale increases. By using linear probes on frozen models, the study reveals that pre-training encodes much of the necessary information for identifying semantic roles, although complete mastery still benefits from task-specific adaptation.

What carries the argument

Linear probes applied to frozen representations from decoder-only transformer language models, measuring the extractability of semantic role labels.

If this is right

Pre-training encodes substantial semantic role information without task-specific supervision.
Semantic role extraction accuracy improves with increasing model scale.
The internal representation of semantic roles becomes more distributed rather than localized in larger models.
Fine-tuning can still enhance performance beyond what pre-training provides alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Other syntactic or semantic features might exhibit similar scale-dependent shifts in encoding style.
Interpretability techniques may need adaptation for very large models where information is highly distributed.
This suggests that scaling laws could apply not just to performance but to the geometry of learned linguistic structures.

Load-bearing premise

That the accuracy of linear probes on frozen model layers directly measures the semantic role information present from pre-training without the probes creating new structure.

What would settle it

Finding that semantic role information in large models is localized in specific layers or neurons, rather than distributed, or that small models show equally distributed encoding would challenge the claim of a scale-dependent shift.

Figures

Figures reproduced from arXiv: 2605.09187 by Carla Griffiths, Mirco Musolesi.

**Figure 2.** Figure 2: Representational changes during fine-tuning, and isolation of pre-training from adaptation. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Ablation ∆Agent F1 (role-specific, matching [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Layer × role distribution of the top-20 PCA-identified selective neurons per layer in the Small (3.2M) model. Cell values are neuron counts. Named PropBank-style roles (Agent, Location, Manner) concentrate in early layers (L0–L1); Time neurons appear in every layer; Quantity neurons dominate the top-20 cap throughout the network. C.1 Co-activation Correlation Methodology The co-activation correlations (ρ) … view at source ↗

**Figure 5.** Figure 5: Role-specific neuron separation is weak but increases with depth. [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: Small model (3.2M transformer parameters): PropBank role representations. Final layer [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗

**Figure 7.** Figure 7: Tiny model (0.4M transformer parameters): PropBank role representations. PCA (left) and [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

**Figure 8.** Figure 8: Base model (18.9M transformer parameters): PropBank role representations. PCA (left) [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗

**Figure 9.** Figure 9: Medium model (56.7M transformer parameters): PropBank role representations. PCA (left) [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗

**Figure 10.** Figure 10: Layer-wise probing F1 on GPT-2 Small and Medium. SRL information peaks in deeper [PITH_FULL_IMAGE:figures/full_fig_p020_10.png] view at source ↗

**Figure 11.** Figure 11: Per-layer probe F1 across pre-training, one panel per scale. Each line is one layer [PITH_FULL_IMAGE:figures/full_fig_p021_11.png] view at source ↗

read the original abstract

Understanding how linguistic structure emerges in language models is central to interpreting what these systems learn from data and how much supervision they truly require. In particular, semantic role understanding ("who did what to whom") is a core component of meaning representation, yet it remains unclear whether it arises from pre-training alone or depends on task-specific fine-tuning. We study whether semantic role understanding emerges during language model pre-training or requires task-specific fine-tuning. We freeze decoder-only transformers and train linear probes to extract semantic roles, using performance to infer whether role information is already encoded in pre-training or learned during adaptation. Across model scales, we find that frozen representations contain substantial semantic role information, with performance improving but not fully matching fine-tuned models. This indicates partial but incomplete emergence from pre-training alone. We show that semantic role structure emerges from language modeling objectives, but its internal implementation shifts toward more distributed representations as model scale increases.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Linear probes show semantic role info in frozen LM reps that improves with scale, but the claim of a shift to distributed representations lacks direct support.

read the letter

The main point is that linear probes recover semantic role labels from frozen decoder-only transformer representations, with accuracy rising across model scales yet staying below what fine-tuning achieves. This points to partial emergence of role structure from the pre-training objective alone. The paper does a clean job of using the frozen-versus-adapted comparison to quantify how much is already encoded versus what task-specific training adds, and focusing on semantic roles rather than more commonly probed syntactic features is a reasonable extension of existing work. The scale trend is worth noting for anyone thinking about how representations evolve with size. The softer spot is the interpretation that the internal implementation becomes more distributed as scale grows. Probe accuracy alone does not isolate that; it could simply reflect stronger or more redundant linear signals without changes in dimensionality, sparsity, or nonlinear dependence. The abstract gives no numbers, no baseline details, no probe-capacity controls, and no additional metrics like nonlinear probe comparisons or unit ablations, so the distributedness part rests on an inference that needs tighter evidence. This is the sort of incremental probing study that fits in the interpretability and emergence literature. Readers working on what LMs learn unsupervised or on reducing labeled data for semantic tasks would get some value from the setup, though they would probably want to see the full experimental controls before building on it. I would send it for peer review because the question is relevant and the basic design is straightforward; referees could push for the missing checks on the distributed claim without major rework.

Referee Report

2 major / 1 minor

Summary. The paper examines whether semantic role understanding emerges in decoder-only language models solely from pre-training. It freezes model representations across scales and trains linear probes to decode semantic roles (who-did-what-to-whom), comparing probe accuracy against fine-tuned baselines. Results indicate substantial role information is linearly decodable from frozen pre-trained representations, with accuracy improving as scale increases yet remaining below fully fine-tuned performance; the authors conclude that semantic role structure emerges from language modeling objectives but shifts toward more distributed internal representations with larger models.

Significance. If substantiated with appropriate controls and metrics, the work would provide useful empirical evidence on the limits of unsupervised emergence of core linguistic structure in LMs, particularly the partial nature of semantic role encoding and its scale dependence. The probe-based methodology on frozen decoder-only models is a standard tool for such questions and could inform debates on what pre-training actually captures versus what requires task-specific adaptation.

major comments (2)

[Abstract / §3] Abstract and §3 (results on scale dependence): The claim that 'its internal implementation shifts toward more distributed representations as model scale increases' is not directly supported by the reported linear probe accuracies alone. Improved linear decodability with scale could reflect stronger, more redundant, or higher-dimensional encodings without necessarily indicating a change in distributedness; no explicit metric (e.g., effective dimensionality, sparsity measures, unit ablation, or linear-vs-nonlinear probe comparisons) is described to isolate this shift.
[Abstract / Methods] Abstract and methods: The central inference that linear probe performance on frozen representations indicates semantic role information 'already encoded' during pre-training lacks reported controls for probe capacity, random baselines, or statistical significance testing. Without these, it remains possible that probes introduce or amplify structure rather than purely extract pre-existing encodings, weakening the emergence claim.

minor comments (1)

[Abstract] The abstract provides no numerical performance values, exact model scales tested, or dataset details for the semantic role probing task, making it difficult to assess the magnitude of the reported improvements.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our work examining the emergence of semantic role understanding in pre-trained decoder-only language models. We address each major comment below and indicate revisions to be made in the next version of the manuscript.

read point-by-point responses

Referee: [Abstract / §3] Abstract and §3 (results on scale dependence): The claim that 'its internal implementation shifts toward more distributed representations as model scale increases' is not directly supported by the reported linear probe accuracies alone. Improved linear decodability with scale could reflect stronger, more redundant, or higher-dimensional encodings without necessarily indicating a change in distributedness; no explicit metric (e.g., effective dimensionality, sparsity measures, unit ablation, or linear-vs-nonlinear probe comparisons) is described to isolate this shift.

Authors: We agree that linear probe accuracy improvements with scale do not by themselves isolate a shift toward more distributed representations, as they could alternatively reflect stronger or more redundant encodings. The manuscript's claim draws from the observed pattern of increasing linear decodability alongside the partial gap to fine-tuned performance, but we acknowledge the need for a more direct metric. In the revised manuscript we will qualify the statement in the abstract and §3 and add analyses of effective dimensionality of the probed subspaces (via participation ratio) together with linear-versus-nonlinear probe comparisons to better characterize any representational shift. revision: yes
Referee: [Abstract / Methods] Abstract and methods: The central inference that linear probe performance on frozen representations indicates semantic role information 'already encoded' during pre-training lacks reported controls for probe capacity, random baselines, or statistical significance testing. Without these, it remains possible that probes introduce or amplify structure rather than purely extract pre-existing encodings, weakening the emergence claim.

Authors: We accept that explicit controls are necessary to support the inference of pre-existing encodings. The current manuscript relies on the standard linear-probe methodology and the gap between frozen and fine-tuned performance, but does not report the requested baselines. In the revised version we will add (i) random-label and random-feature baselines, (ii) statistical significance testing across multiple seeds, and (iii) a brief discussion of probe capacity relative to hidden dimension. These additions will clarify that the reported accuracies reflect information present in the frozen pre-trained representations. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical probe-based claims of semantic role emergence

full rationale

The paper reports an empirical study: decoder-only transformers are frozen, linear probes are trained to extract semantic roles from their representations, and accuracy is measured across scales and compared to fine-tuned baselines. The provided text contains no equations, no derivations, and no self-citations invoked as load-bearing premises. Claims that semantic role structure 'emerges from language modeling objectives' and 'shifts toward more distributed representations' are interpretive conclusions drawn from the observed probe performance patterns, not reductions by construction to fitted parameters or self-referential definitions. No step matches any of the enumerated circularity patterns; the analysis is self-contained against external benchmarks (probe accuracy on held-out data).

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that linear probe performance is a valid proxy for encoded semantic role knowledge and that the chosen models and tasks are representative of broader language modeling behavior.

axioms (1)

domain assumption Linear probes can extract semantic role information if it is present in the model's representations
The inference from probe performance to pre-training emergence depends on this assumption being true.

pith-pipeline@v0.9.0 · 5448 in / 1188 out tokens · 32671 ms · 2026-05-12T01:50:16.320211+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · 4 internal anchors

[1]

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP'21) , year=

Transformer Feed-Forward Layers Are Key-Value Memories , author=. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP'21) , year=

work page 2021
[2]

Interpretability in the Wild: A Circuit for Indirect Object Identification in

Wang, Kevin and Variengien, Alexandre and Conmy, Arthur and Shlegeris, Buck and Steinhardt, Jacob , booktitle=. Interpretability in the Wild: A Circuit for Indirect Object Identification in

work page
[3]

Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS'23) , year=

Towards Automated Circuit Discovery for Mechanistic Interpretability , author=. Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS'23) , year=

work page
[4]

Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS'20) , year=

Language Models are Few-Shot Learners , author=. Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS'20) , year=

work page
[5]

Synthesis Lectures on Human Language Technologies , volume=

Semantic Role Labeling , author=. Synthesis Lectures on Human Language Technologies , volume=. 2010 , publisher=

work page 2010
[6]

Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP'15) , year=

Question-Answer Driven Semantic Role Labeling: Using Natural Language to Annotate Natural Language , author=. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP'15) , year=

work page 2015
[7]

Shi, Peng and Lin, Jimmy , journal=. Simple

work page
[8]

Transactions on Machine Learning Research , year=

Emergent Abilities of Large Language Models , author=. Transactions on Machine Learning Research , year=

work page
[9]

Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS'23) , year=

Are Emergent Abilities of Large Language Models a Mirage? , author=. Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS'23) , year=

work page
[10]

Computational Linguistics , volume=

Probing Classifiers: Promises, Shortcomings, and Advances , author=. Computational Linguistics , volume=. 2022 , publisher=

work page 2022
[11]

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP'19) , year=

Designing and Interpreting Probes with Control Tasks , author=. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP'19) , year=

work page 2019
[12]

Proceedings of the 2024 Conference on Language Modeling (COLM'24) , year=

Predicting Emergent Capabilities by Fine-tuning , author=. Proceedings of the 2024 Conference on Language Modeling (COLM'24) , year=

work page 2024
[13]

Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL'19) , year=

A Structural Probe for Finding Syntax in Word Representations , author=. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL'19) , year=

work page 2019
[14]

Tenney, Ian and Das, Dipanjan and Pavlick, Ellie , booktitle=

work page
[15]

OpenAI Blog , volume=

Language Models are Unsupervised Multitask Learners , author=. OpenAI Blog , volume=

work page
[16]

Pointer Sentinel Mixture Models

Pointer Sentinel Mixture Models , author=. arXiv preprint arXiv:1609.07843 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[17]

Transactions of the Association for Computational Linguistics , volume=

Natural Questions: A Benchmark for Question Answering Research , author=. Transactions of the Association for Computational Linguistics , volume=

work page
[18]

Proceedings of the 36th International Conference on Machine Learning (ICML'19) , year=

Similarity of Neural Network Representations Revisited , author=. Proceedings of the 36th International Conference on Machine Learning (ICML'19) , year=

work page
[19]

Proceedings of the 7th International Conference on Learning Representations (ICLR'19) , year=

Decoupled Weight Decay Regularization , author=. Proceedings of the 7th International Conference on Learning Representations (ICLR'19) , year=

work page
[20]

Large-Scale

FitzGerald, Nicholas and Michael, Julian and He, Luheng and Zettlemoyer, Luke , booktitle=. Large-Scale

work page
[21]

Nature Communications , volume=

A power law describes the magnitude of adaptation in neural populations of primary visual cortex , author=. Nature Communications , volume=

work page
[22]

Physical Review Research , volume=

Zeroth, First, and Second Order Phase Transitions in Deep Neural Networks , author=. Physical Review Research , volume=

work page
[23]

Training Compute-Optimal Large Language Models

Training Compute-Optimal Large Language Models , author=. arXiv preprint arXiv:2203.15556 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[24]

Proceedings of the 12th International Conference on Learning Representations (ICLR'24) , year=

Language Models Represent Space and Time , author=. Proceedings of the 12th International Conference on Learning Representations (ICLR'24) , year=

work page
[25]

Visualizing Data using t-

van der Maaten, Laurens and Hinton, Geoffrey , journal=. Visualizing Data using t-

work page
[26]

Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina , booktitle=

work page
[27]

Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL'18) , year=

Deep Contextualized Word Representations , author=. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL'18) , year=

work page 2018
[28]

Proceedings of the 30th Conference on Neural Information Processing Systems (NeurIPS'17) , year=

Attention Is All You Need , author=. Proceedings of the 30th Conference on Neural Information Processing Systems (NeurIPS'17) , year=

work page
[29]

Advances in Neural Information Processing Systems 26 (NeurIPS'13) , year=

Distributed Representations of Words and Phrases and their Compositionality , author=. Advances in Neural Information Processing Systems 26 (NeurIPS'13) , year=

work page
[30]

Pennington, Jeffrey and Socher, Richard and Manning, Christopher D , booktitle=

work page
[31]

A Primer in

Rogers, Anna and Kovaleva, Olga and Rumshisky, Anna , journal=. A Primer in

work page
[32]

What Does

Jawahar, Ganesh and Sagot, Beno. What Does. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL'19) , year=

work page
[33]

What Does

Clark, Kevin and Khandelwal, Urvashi and Levy, Omer and Manning, Christopher D , booktitle=. What Does

work page
[34]

Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL'19) , year=

Linguistic Knowledge and Transferability of Contextual Representations , author=. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL'19) , year=

work page 2019
[35]

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP'20) , year=

Probing Pretrained Language Models for Lexical Semantics , author=. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP'20) , year=

work page 2020
[36]

Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL'17) , year=

Deep Semantic Role Labeling: What Works and What's Next , author=. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL'17) , year=

work page
[37]

Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP'17) , year=

Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling , author=. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP'17) , year=

work page 2017
[38]

Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL'15) , year=

End-to-end Learning of Semantic Role Labeling Using Recurrent Neural Networks , author=. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL'15) , year=

work page
[39]

Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL'16) , year=

Neural Semantic Role Labeling with Dependency Path Embeddings , author=. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL'16) , year=

work page
[40]

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP'18) , year=

Linguistically-Informed Self-Attention for Semantic Role Labeling , author=. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP'18) , year=

work page 2018
[41]

Transformer Circuits Thread , year=

Toy Models of Superposition , author=. Transformer Circuits Thread , year=

work page
[42]

Transformer Circuits Thread , year=

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning , author=. Transformer Circuits Thread , year=

work page
[43]

Proceedings of the 12th International Conference on Learning Representations (ICLR'24) , year=

Sparse Autoencoders Find Highly Interpretable Features in Language Models , author=. Proceedings of the 12th International Conference on Learning Representations (ICLR'24) , year=

work page
[44]

Scaling Monosemanticity: Extracting Interpretable Features from

Templeton, Adly and Conerly, Tom and Marcus, Jonathan and others , journal=. Scaling Monosemanticity: Extracting Interpretable Features from. 2024 , note=

work page 2024
[45]

Transformer Circuits Thread , year=

In-context Learning and Induction Heads , author=. Transformer Circuits Thread , year=

work page
[46]

Progress measures for grokking via mechanistic interpretability

Progress measures for grokking via mechanistic interpretability , author=. arXiv preprint arXiv:2301.05217 , year=

work page internal anchor Pith review arXiv
[47]

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL'18) , year=

Universal Language Model Fine-tuning for Text Classification , author=. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL'18) , year=

work page
[48]

Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL'19): Tutorial Abstracts , year=

Transfer Learning in Natural Language Processing , author=. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL'19): Tutorial Abstracts , year=

work page 2019
[49]

arXiv preprint arXiv:2202.07785 , year=

Predictability and Surprise in Large Generative Models , author=. arXiv preprint arXiv:2202.07785 , year=

work page arXiv
[50]

Scaling Laws for Neural Language Models

Scaling Laws for Neural Language Models , author=. arXiv preprint arXiv:2001.08361 , year=

work page internal anchor Pith review Pith/arXiv arXiv 2001
[51]

Computational Linguistics , volume=

Automatic Labeling of Semantic Roles , author=. Computational Linguistics , volume=

work page
[52]

Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05) , year=

Semantic Role Labeling using Different Syntactic Views , author=. Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05) , year=

work page
[53]

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP'18) , year=

Dissecting Contextual Word Embeddings: Architecture and Representation , author=. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP'18) , year=

work page 2018
[54]

Proceedings of the 40th International Conference on Machine Learning (ICML'23) , year=

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling , author=. Proceedings of the 40th International Conference on Machine Learning (ICML'23) , year=

work page
[55]

Gao, Leo and Biderman, Stella and Black, Sid and Golding, Laurence and Hoppe, Travis and Foster, Charles and Phang, Jason and He, Horace and Thite, Anish and Nabeshima, Noa and others , journal=. The

work page