Artificial Aphasias in Lesioned Language Models
Pith reviewed 2026-05-20 18:26 UTC · model grok-4.3
The pith
Lesioning parameters in language models produces aphasia-like symptoms but in patterns that differ qualitatively from human cases.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Lesioning language models by zeroing parameters causes the full range of aphasia symptoms to surface when outputs are scored with the Text Aphasia Battery, but the profiles differ in distribution from human patients. Symptom patterns vary between attention components and feed-forward components and also vary with layer depth, where early layers produce more syntactic and semantic deficits and late-middle layers produce more phonological and fluency deficits. Although some lesions yield profiles that are quantitatively closer to particular human aphasia types, the qualitative mismatches indicate that such syndromes are shaped by the concrete mechanisms of learning and processing rather than a
What carries the argument
Selective zeroing of model parameters followed by symptom diagnosis with the Text Aphasia Battery, used to compare impairment profiles across attention versus feed-forward components and across layer depths.
If this is right
- Symptom profiles differ systematically between attention components and feed-forward components.
- Early-layer lesions disproportionately produce syntactic and semantic deficits.
- Late-middle-layer lesions produce higher rates of phonological and fluency deficits.
- Quantitative resemblance to some human aphasia types occurs, yet qualitative pattern differences remain.
- Aphasia syndromes reflect the particular learning and processing details of the system rather than domain-invariant consequences of disruption.
Where Pith is reading between the lines
- The same lesioning approach could be applied to other model families to test whether the observed component and depth effects generalize.
- Training choices that alter how syntax or phonology is represented might shift the lesion-induced deficit patterns toward or away from human profiles.
- Mapping which model parts produce which deficits could guide targeted improvements in robustness against specific language failures.
Load-bearing premise
That the Text Aphasia Battery applied to model-generated text produces symptom labels validly comparable to clinical diagnoses in humans.
What would settle it
Repeating the lesioning and scoring procedure on models trained with markedly different objectives or architectures and finding that symptom profiles now match human aphasia distributions in both quantitative similarity and qualitative clustering would undermine the conclusion.
Figures
read the original abstract
Aphasias, selective language impairments which can arise from brain damage, reveal the functional organization of human language by providing causal links between affected brain regions and specific symptom profiles. Drawing on this literature, we introduce an aphasia-inspired technique to characterize the emergent functional organization of language models (LMs). We ``lesion'' (zero-out) model parameters and measure the effects of this intervention against clinical aphasia symptoms, as diagnosed by the Text Aphasia Battery (TAB). When applied to 112,426 outputs from five 1B-scale LMs, the full range of evaluated symptoms surface, but in distributions largely distinct from those of humans. Our method uncovers broad symptom-profile differences between attention components (query, key, value, output) and feed-forward components (up, gate, down), with weaker evidence for differences among components within the same mechanism. We also find an effect of depth, where lesions in early layers disproportionately cause syntactic and semantic symptoms while late-middle layers yield higher rates of phonological and fluency deficits. Although some LM lesions induce quantitatively more similar profiles to some human aphasia types than others, qualitative differences in symptom patterns between LMs and humans suggest that aphasia syndromes are heavily influenced by the details of learning and processing rather than being a domain-invariant consequence of disrupted language processing.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces an aphasia-inspired lesioning technique for language models, zeroing out parameters in five 1B-scale LMs and applying the Text Aphasia Battery (TAB) to 112,426 generated outputs to diagnose symptom profiles. It reports that the full range of clinical symptoms appears but in distributions distinct from human aphasias, with broad differences between attention (query/key/value/output) and feed-forward (up/gate/down) components, weaker within-mechanism differences, and depth effects (early layers more syntactic/semantic deficits; late-middle layers more phonological/fluency deficits). The central claim is that qualitative mismatches with human profiles indicate aphasia syndromes are shaped by learning/processing details rather than domain-invariant consequences of disrupted language processing.
Significance. If the TAB application yields validly comparable symptom labels, the work offers a scalable empirical method for mapping LM internal components to functional language deficits, supported by a large sample and clear component- and depth-level contrasts. This could advance interpretability research by bridging clinical linguistics and neural network analysis, while providing evidence that aphasia profiles depend on architectural and training specifics.
major comments (2)
- [Abstract] The conclusion that qualitative differences imply aphasia syndromes are heavily influenced by details of learning and processing (rather than domain-invariant) is load-bearing on the assumption that TAB symptom labels applied to autoregressive LM text are functionally analogous to human clinical diagnoses. The manuscript provides no reported validation steps such as expert human concordance rates, explicit mapping of LM error types to clinical criteria, or controls for baseline generation artifacts from tokenization/sampling.
- [Methods] It is unclear how the TAB was adapted for model-generated text (zeroed parameters, lack of embodied production) or whether baseline error rates in unlesioned models were subtracted or controlled when computing symptom rates; this directly affects interpretation of the reported component- and depth-level differences as evidence against invariance.
minor comments (1)
- [Abstract] The abstract states 'weaker evidence for differences among components within the same mechanism' without specifying the statistical criteria, p-value thresholds, or effect-size measures used to support this assessment.
Simulated Author's Rebuttal
We thank the referee for their constructive comments. These have helped us clarify the methodological details and limitations of our approach. We address the major comments point by point below.
read point-by-point responses
-
Referee: [Abstract] The conclusion that qualitative differences imply aphasia syndromes are heavily influenced by details of learning and processing (rather than domain-invariant) is load-bearing on the assumption that TAB symptom labels applied to autoregressive LM text are functionally analogous to human clinical diagnoses. The manuscript provides no reported validation steps such as expert human concordance rates, explicit mapping of LM error types to clinical criteria, or controls for baseline generation artifacts from tokenization/sampling.
Authors: We recognize that direct validation of the TAB on LM-generated text, such as through expert concordance, was not performed in the original manuscript. This is a valid concern for the strength of the analogy. In the revised version, we have added a new section discussing the adaptation process and providing an explicit mapping of observed LM errors to TAB criteria, along with illustrative examples. We also clarify that symptom rates were computed as differences from unlesioned baseline models to control for generation artifacts. While we cannot retroactively add human expert ratings without new data collection, we believe these additions strengthen the presentation of our results and support the claim of qualitative differences. revision: partial
-
Referee: [Methods] It is unclear how the TAB was adapted for model-generated text (zeroed parameters, lack of embodied production) or whether baseline error rates in unlesioned models were subtracted or controlled when computing symptom rates; this directly affects interpretation of the reported component- and depth-level differences as evidence against invariance.
Authors: The adaptation of the TAB for model-generated text is described in the Methods section, where we explain that we applied the battery's textual diagnostic criteria to the outputs, as the symptoms are primarily linguistic and do not depend on embodied aspects. We have expanded this description in the revision to include more details on handling zeroed parameters' effects on generation. Additionally, baseline error rates from unlesioned models were indeed subtracted to isolate lesion-induced symptoms; we will make this control more explicit and discuss its implications for interpreting the component and depth effects as evidence against domain-invariant profiles. revision: yes
- Providing expert human concordance rates for the application of TAB to LM-generated text, as this would require new data collection not present in the original study.
Circularity Check
No circularity: empirical symptom measurements after lesions
full rationale
The paper reports direct empirical results from zeroing parameters in 1B-scale LMs, generating 112k outputs, and scoring them with the Text Aphasia Battery for symptom rates. These rates are then compared distributionally to human aphasia profiles. No equations, fitted parameters, or derivations are invoked that would make any reported symptom frequency or qualitative difference equivalent to its own inputs by construction. The central claim about domain-invariance follows from the observed mismatches rather than from any self-definitional mapping or self-citation chain. The analysis is therefore self-contained as a set of measurements.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The Text Aphasia Battery produces symptom labels on model-generated text that are meaningfully comparable to clinical aphasia diagnoses in humans.
Reference graph
Works this paper leans on
- [1]
-
[2]
Brains and algorithms partially converge in natural language processing , journal =
Caucheteux, Charlotte and King, Jean-R. Brains and algorithms partially converge in natural language processing , journal =. 2022 , doi =
work page 2022
-
[3]
Causal Scrubbing: A Method for Rigorously Testing Interpretability Hypotheses , howpublished =
Chan, Lawrence and Garriga-Alonso, Adri. Causal Scrubbing: A Method for Rigorously Testing Interpretability Hypotheses , howpublished =. 2022 , url =
work page 2022
- [4]
-
[5]
Cohen, Jacob , title =
-
[6]
Comanici, Gheorghe and Bieber, Eric and Schaekermann, Mike and Pasupat, Ice and Sachdeva, Noveen and others , title =. 2025 , eprint =
work page 2025
-
[7]
Towards Automated Circuit Discovery for Mechanistic Interpretability , booktitle =
Conmy, Arthur and Mavor-Parker, Augustine and Lynch, Aengus and Heimersheim, Stefan and Garriga-Alonso, Adri. Towards Automated Circuit Discovery for Mechanistic Interpretability , booktitle =
- [8]
-
[9]
Dronkers, Nina F. and Wilkins, David P. and. Lesion Analysis of the Brain Areas Involved in Language Comprehension , journal =. 2004 , doi =
work page 2004
-
[10]
Dronkers, Nina F. and Ivanova, Maria V. and Baldo, Juliana V. , title =. Journal of the International Neuropsychological Society , volume =. 2017 , doi =
work page 2017
-
[11]
Transformer Circuits Thread , publisher =
Elhage, Nelson and Nanda, Neel and Olsson, Catherine and Henighan, Tom and Joseph, Nicholas and Mann, Ben and Askell, Amanda and Bai, Yuntao and Chen, Anna and Conerly, Tom and others , title =. Transformer Circuits Thread , publisher =. 2021 , url =
work page 2021
-
[12]
Fedorenko, Evelina and Ivanova, Anna A. and Regev, Tamar I. , title =. Nature Reviews Neuroscience , volume =. 2024 , doi =
work page 2024
- [13]
-
[14]
Gauthier, Jon and Hu, Jennifer and Wilcox, Ethan and Qian, Peng and Levy, Roger , title =. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations , pages =. 2020 , doi =
work page 2020
-
[15]
Advances in Neural Information Processing Systems , volume =
Geiger, Atticus and Lu, Hanson and Icard, Thomas and Potts, Christopher , title =. Advances in Neural Information Processing Systems , volume =
-
[16]
Gemma 3 Technical Report , year =. 2503.19786 , archivePrefix =
work page internal anchor Pith review Pith/arXiv arXiv
-
[17]
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages =
Geva, Mor and Caciularu, Avi and Wang, Kevin and Goldberg, Yoav , title =. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages =. 2022 , doi =
work page 2022
- [18]
-
[19]
Goodglass, Harold and Kaplan, Edith and Barresi, Barbara , title =. 2001 , note =
work page 2001
- [20]
-
[21]
Advances in Neural Information Processing Systems , volume =
Hase, Peter and Xie, Harry and Bansal, Mohit , title =. Advances in Neural Information Processing Systems , volume =
-
[22]
Nature Reviews Neuroscience , volume =
Hickok, Gregory and Poeppel, David , title =. Nature Reviews Neuroscience , volume =. 2007 , doi =
work page 2007
-
[23]
Hinton, Geoffrey E. and Shallice, Tim , title =. Psychological Review , volume =. 1991 , doi =
work page 1991
-
[24]
Kertesz, Andrew , title =
-
[25]
LeCun, Yann and Denker, John S. and Solla, Sara A. , title =. Advances in Neural Information Processing Systems , volume =. 1990 , publisher =
work page 1990
-
[26]
MacWhinney, Brian and Fromm, Davida and Forbes, Margaret and Holland, Audrey , title =. Aphasiology , volume =. 2011 , doi =
work page 2011
-
[27]
Advances in Neural Information Processing Systems , volume =
Meng, Kevin and Bau, David and Andonian, Alex and Belinkov, Yonatan , title =. Advances in Neural Information Processing Systems , volume =
- [28]
-
[29]
Advances in Neural Information Processing Systems , volume =
Michel, Paul and Levy, Omer and Neubig, Graham , title =. Advances in Neural Information Processing Systems , volume =
-
[30]
The Eleventh International Conference on Learning Representations , year =
Nanda, Neel and Chan, Lawrence and Lieberum, Tom and Smith, Jess and Steinhardt, Jacob , title =. The Eleventh International Conference on Learning Representations , year =
-
[31]
Transformer Circuits Thread , publisher =
Olsson, Catherine and Elhage, Nelson and Nanda, Neel and Joseph, Nicholas and DasSarma, Nova and Henighan, Tom and Mann, Ben and Askell, Amanda and Bai, Yuntao and Chen, Anna and Conerly, Tom and Drain, Dawn and Ganguli, Deep and Hatfield-Dodds, Zac and Hernandez, Danny and Johnston, Scott and Jones, Andy and Kernion, Jackson and Lovitt, Liane and Ndousse...
work page 2022
-
[32]
Plaut, David C. and Shallice, Tim , title =. Cognitive Neuropsychology , volume =. 1993 , doi =
work page 1993
-
[33]
Qwen2.5 Technical Report , year =. 2412.15115 , archivePrefix =
work page internal anchor Pith review Pith/arXiv arXiv
-
[34]
arXiv preprint arXiv:2511.20507 , year =
Roll, Nathan and Kries, Jill and Jin, Flora and Wang, Catherine and Finley, Ann Marie and Sumner, Meghan and Shain, Cory and Gwilliams, Laura , title =. arXiv preprint arXiv:2511.20507 , year =. 2511.20507 , archivePrefix =
-
[35]
and Kanwisher, Nancy and Tenenbaum, Joshua B
Schrimpf, Martin and Blank, Idan and Tuckute, Greta and Kauf, Carina and Hosseini, Eghbal A. and Kanwisher, Nancy and Tenenbaum, Joshua B. and Fedorenko, Evelina , title =. Proceedings of the National Academy of Sciences , volume =. 2021 , doi =
work page 2021
-
[36]
Swinburn, Kate and Porter, Gill and Howard, David , title =
-
[37]
and Kaiser, Lukasz and Polosukhin, Illia , title =
Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N. and Kaiser, Lukasz and Polosukhin, Illia , title =. Advances in Neural Information Processing Systems , volume =
-
[38]
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , pages =
Voita, Elena and Talbot, David and Moiseev, Fedor and Sennrich, Rico and Titov, Ivan , title =. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , pages =
- [39]
-
[40]
Wang, Chengcheng and Fan, Zhiyu and Han, Zaizhu and Bi, Yanchao and Li, Jixing , title =. bioRxiv , year =. doi:10.1101/2025.02.22.639416 , note =
-
[41]
arXiv preprint arXiv:2601.19723 , year =
Wang, Yifan and Zheng, Jichen and Sun, Jingyuan and Zhang, Yunhao and Ye, Chunyu and Li, Jixing and Zong, Chengqing and Wang, Shaonan , title =. arXiv preprint arXiv:2601.19723 , year =. 2601.19723 , archivePrefix =
- [42]
-
[43]
Wernicke, Carl , title =
-
[44]
Wilson, Stephen M. and Eriksson, Dana K. and Schneck, Sarah M. and Lucanie, Jillian M. , title =. PLOS ONE , volume =. 2018 , doi =
work page 2018
-
[45]
Wilson, Stephen M. and Entrup, Jillian L. and Schneck, Sarah M. and Onuscheck, Caitlin F. and Levy, Deborah F. and Rahman, Maysaa and Willey, Emma and Casilio, Marianne and Yen, Melodie and Brito, Alexandra C. and Kam, Wayneho and Davis, L. Taylor and de Riesthal, Michael and Kirshner, Howard S. , title =. Brain , volume =. 2023 , doi =
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.