Recognition: unknown
Shared Lexical Task Representations Explain Behavioral Variability In LLMs
Pith reviewed 2026-05-09 21:12 UTC · model grok-4.3
The pith
Large language models share the same task-specific attention heads across instruction and example prompts, with head activation explaining performance differences.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Despite large variation in performance as a function of the prompt, the model engages some common underlying mechanisms across different prompts of a task. Specifically, task-specific attention heads whose outputs literally describe the task are shared across prompting styles and trigger subsequent answer production. Behavioral variation between prompts can be explained by the degree to which these heads are activated, and failures are at least sometimes due to competing task representations that dilute the signal of the target task.
What carries the argument
Lexical task heads: attention heads whose outputs describe the task and are activated similarly across prompt styles to drive answer generation.
If this is right
- Prompt performance differences arise from varying levels of activation in the same set of task heads rather than entirely separate mechanisms.
- Competing task representations can dilute the target task signal and produce errors even when the model has the relevant heads.
- The same heads trigger answer production after being activated by either instruction or demonstration prompts.
- Task representations remain consistent enough across prompt styles that variability is largely a matter of activation strength.
Where Pith is reading between the lines
- Intervening on these heads could provide a direct way to stabilize model behavior without changing the prompt text.
- Models may maintain multiple task representations simultaneously, and the dominant one determines the output when activation levels differ.
- This mechanism suggests that failures on new tasks could be diagnosed by checking whether the expected lexical task heads are present and sufficiently activated.
Load-bearing premise
The identified heads causally drive task performance and are genuinely shared rather than merely correlated with the prompt style used.
What would settle it
Measuring whether directly suppressing or boosting activation in the identified heads alters task accuracy in the same direction for both instruction and example prompts, independent of other model components.
Figures
read the original abstract
One of the most common complaints about large language models (LLMs) is their prompt sensitivity -- that is, the fact that their ability to perform a task or provide a correct answer to a question can depend unpredictably on the way the question is posed. We investigate this variation by comparing two very different but commonly-used styles of prompting: instruction-based prompts, which describe the task in natural language, and example-based prompts, which provide in-context few-shot demonstration pairs to illustrate the task. We find that, despite large variation in performance as a function of the prompt, the model engages some common underlying mechanisms across different prompts of a task. Specifically, we identify task-specific attention heads whose outputs literally describe the task -- which we dub lexical task heads -- and show that these heads are shared across prompting styles and trigger subsequent answer production. We further find that behavioral variation between prompts can be explained by the degree to which these heads are activated, and that failures are at least sometimes due to competing task representations that dilute the signal of the target task. Our results together present an increasingly clear picture of how LLMs' internal representations can explain behavior that otherwise seems idiosyncratic to users and developers.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper investigates prompt sensitivity in LLMs by comparing instruction-based and example-based prompting. It identifies task-specific attention heads (termed lexical task heads) whose outputs encode task descriptions; these heads are shared across prompt styles, trigger answer production, and their activation degree accounts for performance variation, with failures sometimes arising from competing task representations that dilute the target signal.
Significance. If the causal and sharing claims hold with appropriate controls, the work supplies a mechanistic account of why LLMs exhibit prompt-dependent behavior. It moves beyond descriptive observations of inconsistency by tying variability to identifiable, reusable internal components, which could inform both interpretability research and practical prompting strategies.
major comments (2)
- [Results section on head activation and performance correlation] The central claim that lexical task heads are causally responsible for triggering answer production and that their activation degree explains behavioral variation across prompts requires intervention evidence. Correlational activation patterns alone leave open the possibility that the heads are downstream correlates of successful runs rather than upstream drivers; ablation, patching, or activation-manipulation experiments are needed to establish necessity. This issue is load-bearing for the explanation of prompt failures via competing representations.
- [Methods section describing lexical task head identification] The method for discovering and validating that heads 'literally describe the task' and are shared across prompting styles must include quantitative controls for prompt-style confounds. Without explicit metrics showing that the identification isolates task representations independent of surface prompt features, the sharing claim risks circularity with the prompting manipulation itself.
minor comments (2)
- Clarify the precise definition and quantification of 'degree to which these heads are activated' (e.g., via a specific activation metric or threshold) to allow replication.
- [Abstract] The abstract would benefit from naming the specific tasks or benchmarks used, even briefly, to ground the generality of the findings.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback, which highlights key opportunities to strengthen the causal and methodological foundations of our work. We address each major comment below and commit to revisions that directly respond to the concerns raised.
read point-by-point responses
-
Referee: [Results section on head activation and performance correlation] The central claim that lexical task heads are causally responsible for triggering answer production and that their activation degree explains behavioral variation across prompts requires intervention evidence. Correlational activation patterns alone leave open the possibility that the heads are downstream correlates of successful runs rather than upstream drivers; ablation, patching, or activation-manipulation experiments are needed to establish necessity. This issue is load-bearing for the explanation of prompt failures via competing representations.
Authors: We agree that correlational evidence, while consistent with our interpretation, leaves room for alternative accounts in which the observed heads are downstream effects rather than causal drivers. Our manuscript currently relies on activation strength correlating with performance, heads being shared across prompt styles, and their decoded outputs encoding task information. To establish necessity, we will add activation-patching experiments in the revised manuscript: we will selectively boost or suppress the activations of the identified lexical task heads during inference and quantify changes in task accuracy and prompt sensitivity. These interventions will directly test whether manipulating head activation alters answer production and resolves or induces competing-representation failures. revision: yes
-
Referee: [Methods section describing lexical task head identification] The method for discovering and validating that heads 'literally describe the task' and are shared across prompting styles must include quantitative controls for prompt-style confounds. Without explicit metrics showing that the identification isolates task representations independent of surface prompt features, the sharing claim risks circularity with the prompting manipulation itself.
Authors: Our head-identification procedure already employs a contrastive approach that isolates heads whose outputs differ systematically between task-specific and task-irrelevant conditions, and we verify cross-style sharing by matching heads discovered independently from instruction-based versus example-based prompts. Nevertheless, we acknowledge that additional quantitative safeguards against surface-feature confounds would increase rigor. In the revision we will report (i) cosine similarity of activation vectors for the same heads across the two prompt styles, (ii) controls using style-matched but semantically unrelated prompts, and (iii) metrics showing that the decoded task descriptions remain stable after lexical overlap between prompt styles is minimized. These additions will demonstrate that the identified representations are task-specific rather than prompt-style artifacts. revision: yes
Circularity Check
No circularity; empirical head identification is independent of claims
full rationale
The paper presents an empirical analysis identifying task-specific attention heads via inspection of model internals, then correlates their activation and sharing across prompt styles with observed behavioral variation. No derivation step reduces a claimed prediction to a fitted input by construction, invokes a self-citation as the sole justification for uniqueness, or renames a known result under new coordinates. The account of shared representations and competing signals is grounded in direct observation of activations rather than self-referential definitions, leaving the central claims self-contained against external model behavior benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Attention heads can encode and output representations that literally describe a task
invented entities (1)
-
lexical task heads
no independent evidence
Reference graph
Works this paper leans on
-
[1]
cc/paper_files/paper/2020/file/ 1457c0d6bfcb4967418bfb8ac142f64a-Paper
URL https://proceedings.neurips. cc/paper_files/paper/2020/file/ 1457c0d6bfcb4967418bfb8ac142f64a-Paper. pdf. Tianyu Cao, Neel Bhandari, Akhila Yerukola, Akari Asai, and Maarten Sap. Out of style: RAG’s fragility to lin- guistic variation. In Vera Demberg, Kentaro Inui, and Lluís Marquez (eds.),Proceedings of the 19th Confer- ence of the European Chapter ...
2020
-
[2]
URL https: //aclanthology.org/2026.eacl-long.13/
doi: 10.18653/v1/2026.eacl-long.13. URL https: //aclanthology.org/2026.eacl-long.13/. Hakaze Cho, Haolin Yang, Gouki Minegishi, and Naoya Inoue. Mechanism of task-oriented information removal in in-context learning, 2025. URL https://arxiv. org/abs/2509.21012. Bilal Chughtai, Alan Cooney, and Neel Nanda. Summing up the facts: Additive mechanisms behind fa...
-
[3]
URL https://aclanthology.org/2025. findings-naacl.283/. Guy Dar, Mor Geva, Ankit Gupta, and Jonathan Be- rant. Analyzing transformers in embedding space. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (eds.),Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 16124–16170, Toronto, Ca...
-
[4]
Association for Computational Linguistics. ISBN 10 Shared Lexical Task Representations Explain Behavioral Variability In LLMs 979-8-89176-251-0. doi: 10.18653/v1/2025.acl-long
-
[5]
What Did I Do Wrong? Quantifying LLM s' Sensitivity and Consistency to Prompt Engineering
URL https://aclanthology.org/2025. acl-long.866/. Federico Errica, Davide Sanvito, Giuseppe Siracusano, and Roberto Bifulco. What did I do wrong? quanti- fying LLMs’ sensitivity and consistency to prompt en- gineering. In Luis Chiruzzo, Alan Ritter, and Lu Wang (eds.),Proceedings of the 2025 Conference of the Na- tions of the Americas Chapter of the Assoc...
-
[6]
URL https://openreview.net/forum? id=x2Dw9aNbvw. Roee Hendel, Mor Geva, and Amir Globerson. In- context learning creates task vectors. In Houda Bouamor, Juan Pino, and Kalika Bali (eds.),Find- ings of the Association for Computational Linguis- tics: EMNLP 2023, pp. 9318–9333, Singapore, De- cember 2023. Association for Computational Lin- guistics. doi: 10...
-
[7]
findings-emnlp.624/
URL https://aclanthology.org/2023. findings-emnlp.624/. Evan Hernandez, Arnab Sen Sharma, Tal Haklay, Kevin Meng, Martin Wattenberg, Jacob Andreas, Yonatan Be- linkov, and David Bau. Linearity of relation decod- ing in transformer language models. InThe Twelfth International Conference on Learning Representations, 12 Shared Lexical Task Representations Ex...
2023
-
[8]
Neeko: Leveraging dynamic LoRA for efficient multi-character role-playing agent
URL https://openreview.net/forum? id=w7LU2s14kE. Patrick Kahardipraja, Reduan Achtibat, Thomas Wiegand, Wojciech Samek, and Sebastian Lapuschkin. The at- las of in-context learning: How attention heads shape in-context retrieval augmentation. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. URL https://openreview.net/ f...
-
[9]
The power of scale for parameter-efficient prompt tuning,
URL https://aclanthology.org/2024. emnlp-main.699/. Brian Lester, Rami Al-Rfou, and Noah Constant. The power of scale for parameter-efficient prompt tuning. In Marie- Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (eds.),Proceedings of the 2021 Con- ference on Empirical Methods in Natural Language Pro- cessing, pp. 3045–3059, Online a...
-
[10]
URL https://aclanthology.org/2021. emnlp-main.243/. Millicent Li, Alberto Mario Ceballos Arroyo, Giordano Rogers, Naomi Saphra, and Byron C. Wallace. Do nat- ural language descriptions of model activations convey privileged information?, 2025. URL https://arxiv. org/abs/2509.13316. Emmy Liu, Graham Neubig, and Jacob Andreas. An incom- plete loop: Instruct...
work page internal anchor Pith review arXiv 2021
-
[11]
Association for Computational Linguistics. ISBN 979-8-89176-332-6. doi: 10.18653/v1/2025.emnlp-main
-
[12]
URL https://aclanthology.org/2025. emnlp-main.762/. Jack Merullo, Carsten Eickhoff, and Ellie Pavlick. Cir- cuit component reuse across tasks in transformer lan- guage models. InThe Twelfth International Confer- ence on Learning Representations, 2024. URL https: //openreview.net/forum?id=fpoAYV6Wsk. Andrew Joohun Nam, Henry Conklin, Yukang Yang, Thomas L....
-
[13]
Eric Todd, Millicent L
URL https://openreview.net/forum? id=O8rrXl71D5. Eric Todd, Millicent L. Li, Arnab Sen Sharma, Aaron Mueller, Byron C. Wallace, and David Bau. Function vec- tors in large language models. InProceedings of the 2024 International Conference on Learning Representations,
2024
-
[14]
URL https://aclantholo gy.org/2023.acl-long.557/
URL https://openreview.net/forum? id=AwyxtyMwaG. Kevin Ro Wang, Alexandre Variengien, Arthur Conmy, Buck Shlegeris, and Jacob Steinhardt. Interpretability in the wild: a circuit for indirect object identification in GPT-2 small. InThe Eleventh International Confer- ence on Learning Representations, 2023a. URL https: //openreview.net/forum?id=NpsVSN6o4ul. ...
-
[15]
Germany: Berlin, Japan: ____
Vary the number of demonstration pairs / shots: •" Germany: Berlin, Japan: ____"(1-shot) •" Germany: Berlin, Greece: Athens, Japan: ____"(2-shot)
-
[16]
Germany: Berlin, Greece: Athens, Japan: ____
Use different demonstration pairs: •" Germany: Berlin, Greece: Athens, Japan: ____" •" Peru: Lima, France: Paris, Japan: ____" Many different prompt templates can be constructed forinstruction-based prompting. e.g. • Template A:" What is the capital city of the country? Q: Japan. A: ____" • Template B:" Tell me the capital city of Japan. A: ____" • Templa...
2020
-
[17]
Hwasong-6: North Korea
to extract theuniversalfunction vector heads, which are a set of function vector heads shared across tasks. We then compare the universal function vector heads against the lexical task heads (identified using the procedure detailed in §2.2.2). 0 5 10 15 20 25 30 Head 0 5 10 15 20 25 30 Layer country-capital 0 5 10 15 20 25 30 Head 0 5 10 15 20 25 30 Layer...
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.