pith. machine review for the scientific record. sign in

arxiv: 2506.02132 · v5 · submitted 2025-06-02 · 💻 cs.CL · cs.LG

Recognition: unknown

Model Internal Sleuthing: Finding Lexical Identity and Inflectional Features in Modern Language Models

Michael Li, Nishant Subramani

classification 💻 cs.CL cs.LG
keywords identityinflectionallexicalmodelsfeaturesearlymodelwhile
0
0 comments X
read the original abstract

Large transformer-based language models dominate modern NLP, yet our understanding of how they encode linguistic information relies primarily on studies of early models like BERT and GPT-2. We systematically probe 25 models from BERT Base to Qwen2.5-7B focusing on two linguistic properties: lexical identity and inflectional features across 6 diverse languages. We find a consistent pattern: inflectional features are linearly decodable throughout the model, while lexical identity is prominent early but increasingly weakens with depth. Further analysis of the representation geometry reveals that models with aggressive mid-layer dimensionality compression show reduced steering effectiveness in those layers, despite probe accuracy remaining high. Pretraining analysis shows that inflectional structure stabilizes early while lexical identity representations continue evolving. Taken together, our findings suggest that transformers maintain inflectional features across layers, while trading off lexical identity for compact, predictive representations. Our code is available at https://github.com/ml5885/model_internal_sleuthing

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Inference-Time Machine Unlearning via Gated Activation Redirection

    cs.LG 2026-05 conditional novelty 8.0

    GUARD-IT performs machine unlearning in LLMs via inference-time gated activation redirection, matching or exceeding gradient-based baselines on TOFU and MUSE while preserving utility and working under quantization.

  2. Gradients with Respect to Semantics Preserving Embeddings Tell the Uncertainty of Large Language Models

    cs.CL 2026-05 unverdicted novelty 7.0

    SemGrad is a gradient-based uncertainty quantification technique for free-form LLM generation that operates in semantic space using a Semantic Preservation Score to select stable embeddings.

  3. Defragmenting Language Models: An Interpretability-based Approach for Vocabulary Expansion

    cs.CL 2026-04 unverdicted novelty 7.0

    Interpretability-based selection of vocabulary items plus FragMend initialization reduces token over-fragmentation and improves performance for non-Latin script languages by roughly 20 points over baselines.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · cited by 3 Pith papers · 7 internal anchors

  1. [1]

    Yossi Adi, Einat Kermany, Yonatan Belinkov, Ofer Lavi, and Yoav Goldberg. 2017. https://openreview.net/forum?id=BJh6Ztuxl Fine-grained analysis of sentence embeddings using auxiliary prediction tasks . In 5th International Conference on Learning Representations (Conference Track)

  2. [2]

    Guillaume Alain and Yoshua Bengio. 2017. https://openreview.net/forum?id=ryF7rTqgl Understanding intermediate layers using linear classifier probes . In 5th International Conference on Learning Representations (Workshop Track)

  3. [3]

    Yonatan Belinkov and James Glass. 2019. https://doi.org/10.1162/tacl_a_00254 Analysis methods in neural language processing: A survey . Transactions of the Association for Computational Linguistics, 7:49--72

  4. [4]

    Leo Breiman. 2001. https://doi.org/10.1023/A:1010933404324 Random forests . Mach. Learn., 45(1):5–32

  5. [5]

    Trenton Bricken, Adly Templeton, Joshua Batson, Brian Chen, Adam Jermyn, Tom Conerly, Nick Turner, Cem Anil, Carson Denison, Amanda Askell, Robert Lasenby, Yifan Wu, Shauna Kravec, Nicholas Schiefer, Tim Maxwell, Nicholas Joseph, Zac Hatfield-Dodds, Alex Tamkin, Karina Nguyen, and 6 others. 2023. Towards monosemanticity: Decomposing language models with d...

  6. [6]

    Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, and 12 others. 2020. https://arxiv.org/abs/2005.14165 Lan...

  7. [7]

    Chang, Catherine Arnett, Zhuowen Tu, and Benjamin K

    Tyler A. Chang, Catherine Arnett, Zhuowen Tu, and Benjamin K. Bergen. 2024. https://arxiv.org/abs/2408.10441 Goldfish: Monolingual language models for 350 languages . Preprint, arXiv:2408.10441

  8. [8]

    Hoagy Cunningham, Aidan Ewart, Logan Riggs, Robert Huben, and Lee Sharkey. 2023. https://arxiv.org/abs/2309.08600 Sparse autoencoders find highly interpretable features in language models . Preprint, arXiv:2309.08600

  9. [9]

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. https://doi.org/10.18653/v1/N19-1423 BERT : Pre-training of deep bidirectional transformers for language understanding . In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long a...

  10. [10]

    Yanai Elazar, Shauli Ravfogel, Alon Jacovi, and Yoav Goldberg. 2021. https://doi.org/10.1162/tacl_a_00359 Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals . Transactions of the Association for Computational Linguistics, 9:160--175

  11. [11]

    Nelson Elhage, Neel Nanda, Catherine Olsson, Tom Henighan, Nicholas Joseph, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Nova DasSarma, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, and 6 others. 2021. A mathematical framework for transformer circuits. Transformer C...

  12. [12]

    Kawin Ethayarajh. 2019. https://doi.org/10.18653/v1/D19-1006 How contextual are contextualized word representations? C omparing the geometry of BERT , ELM o, and GPT -2 embeddings . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCN...

  13. [13]

    Atticus Geiger, Hanson Lu, Thomas F Icard, and Christopher Potts. 2021. https://openreview.net/forum?id=RmuXDtjDhG Causal abstractions of neural networks . In Advances in Neural Information Processing Systems

  14. [14]

    Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, and 542 others. 2024. https://arxiv.org/abs/2407.21783 The llama 3...

  15. [15]

    Dirk Groeneveld, Iz Beltagy, Evan Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, and 24 others. 2024. https://doi.org/10.18653/v1/2024.acl-long.841 OLM o: Acceleratin...

  16. [16]

    Trevor Hastie, Robert Tibshirani, and Jerome Friedman. 2009. https://web.stanford.edu/ hastie/ElemStatLearn/ The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2 edition. Springer, New York, NY, USA

  17. [17]

    John Hewitt and Percy Liang. 2019. https://doi.org/10.18653/v1/D19-1275 Designing and interpreting probes with control tasks . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2733--2743, Hong Kong, China. Association fo...

  18. [18]

    Kurt Hornik, Maxwell Stinchcombe, and Halbert White. 1989. https://doi.org/10.1016/0893-6080(89)90020-8 Multilayer feedforward networks are universal approximators . Neural Networks, 2(5):359--366

  19. [19]

    Binyuan Hui, Jian Yang, Zeyu Cui, Jiaxi Yang, Dayiheng Liu, Lei Zhang, Tianyu Liu, Jiajun Zhang, Bowen Yu, Kai Dang, and 1 others. 2024. Qwen2. 5-coder technical report. arXiv preprint arXiv:2409.12186

  20. [20]

    Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Ludwig Schmidt, Hannaneh Hajishirzi, and Ali Farhadi. 2023. https://openreview.net/forum?id=6t0Kwf8-jrj Editing models with task arithmetic . In The Eleventh International Conference on Learning Representations

  21. [21]

    Ganesh Jawahar, Beno \^i t Sagot, and Djam \'e Seddah. 2019. https://doi.org/10.18653/v1/P19-1356 What does BERT learn about the structure of language? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3651--3657, Florence, Italy. Association for Computational Linguistics

  22. [22]

    Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980

  23. [23]

    Tulu 3: Pushing Frontiers in Open Language Model Post-Training

    Nathan Lambert, Jacob Morrison, Valentina Pyatkin, Shengyi Huang, Hamish Ivison, Faeze Brahman, Lester James V. Miranda, Alisa Liu, Nouha Dziri, Shane Lyu, Yuling Gu, Saumya Malik, Victoria Graf, Jena D. Hwang, Jiangjiang Yang, Ronan Le Bras, Oyvind Tafjord, Chris Wilhelm, Luca Soldaini, and 4 others. 2025. https://arxiv.org/abs/2411.15124 Tulu 3: Pushing...

  24. [24]

    Kenneth Li, Oam Patel, Fernanda Vi \'e gas, Hanspeter Pfister, and Martin Wattenberg. 2023. https://openreview.net/forum?id=aLLuYpn83y Inference-time intervention: Eliciting truthful answers from a language model . In Thirty-seventh Conference on Neural Information Processing Systems

  25. [25]

    and Gardner, Matt and Belinkov, Yonatan and Peters, Matthew E

    Nelson F. Liu, Matt Gardner, Yonatan Belinkov, Matthew E. Peters, and Noah A. Smith. 2019. https://doi.org/10.18653/v1/N19-1112 Linguistic knowledge and transferability of contextual representations . In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1...

  26. [26]

    Kevin Meng, David Bau, Alex J Andonian, and Yonatan Belinkov. 2022. https://openreview.net/forum?id=-h6WAS6eE4 Locating and editing factual associations in GPT . In Advances in Neural Information Processing Systems

  27. [27]

    nostalgebraist. 2020. https://www.lesswrong.com/posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens interpreting gpt: the logit lens

  28. [28]

    Nina Panickssery, Nick Gabrieli, Julian Schulz, Meg Tong, Evan Hubinger, and Alexander Matt Turner. 2024. https://arxiv.org/abs/2312.06681 Steering llama 2 via contrastive activation addition . Preprint, arXiv:2312.06681

  29. [29]

    Pedregosa, G

    F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine learning in P ython. Journal of Machine Learning Research, 12:2825--2830

  30. [30]

    Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9

  31. [31]

    Anna Rogers, Olga Kovaleva, and Anna Rumshisky. 2020. https://doi.org/10.1162/tacl_a_00349 A primer in BERT ology: What we know about how BERT works . Transactions of the Association for Computational Linguistics, 8:842--866

  32. [32]

    Frank Rosenblatt. 1958. The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review, 65(6):386

  33. [33]

    Nishant Subramani, Jason Eisner, Justin Svegliato, Benjamin Van Durme, Yu Su, and Sam Thomson. 2025. https://aclanthology.org/2025.naacl-long.615/ MICE for CAT s: Model-internal confidence estimation for calibrating agents with tools . In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguis...

  34. [34]

    Nishant Subramani, Nivedita Suresh, and Matthew Peters. 2022. https://doi.org/10.18653/v1/2022.findings-acl.48 Extracting latent steering vectors from pretrained language models . In Findings of the Association for Computational Linguistics: ACL 2022, pages 566--581, Dublin, Ireland. Association for Computational Linguistics

  35. [35]

    Ian Tenney, Dipanjan Das, and Ellie Pavlick. 2019. https://doi.org/10.18653/v1/P19-1452 BERT rediscovers the classical NLP pipeline . In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4593--4601, Florence, Italy. Association for Computational Linguistics

  36. [36]

    Jesse Vig, Sebastian Gehrmann, Yonatan Belinkov, Sharon Qian, Daniel Nevo, Yaron Singer, and Stuart Shieber. 2020. https://proceedings.neurips.cc/paper_files/paper/2020/file/92650b2e92217715fe312e6fa7b90d82-Paper.pdf Investigating gender bias in language models using causal mediation analysis . In Advances in Neural Information Processing Systems, volume ...

  37. [37]

    Elena Voita and Ivan Titov. 2020. https://doi.org/10.18653/v1/2020.emnlp-main.14 Information-theoretic probing with minimum description length . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 183--196, Online. Association for Computational Linguistics

  38. [38]

    Ivan Vuli \'c , Edoardo Maria Ponti, Robert Litschko, Goran Glava s , and Anna Korhonen. 2020. https://doi.org/10.18653/v1/2020.emnlp-main.586 Probing pretrained language models for lexical semantics . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7222--7240, Online. Association for Computational ...

  39. [39]

    Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, and 3 others. 2020. https://doi.org/10.18653/v1/2020.emnlp-demos.6 Transformers: Sta...

  40. [40]

    BigScience Workshop. 2023. https://arxiv.org/abs/2211.05100 Bloom: A 176b-parameter open-access multilingual language model . Preprint, arXiv:2211.05100

  41. [41]

    Amir Zeldes. 2017. https://doi.org/http://dx.doi.org/10.1007/s10579-016-9343-x The GUM corpus: Creating multilayer resources in the classroom . Language Resources and Evaluation, 51(3):581--612