Cross-LLM Consistency in Inference: Evidence from Shared Interactions

Quanshi Zhang; Siyu Lou; Yao Yan; Yuntian Chen

arxiv: 2606.08129 · v1 · pith:G6CDPSOPnew · submitted 2026-06-06 · 💻 cs.AI

Cross-LLM Consistency in Inference: Evidence from Shared Interactions

Siyu Lou , Yao Yan , Yuntian Chen , Quanshi Zhang This is my paper

Pith reviewed 2026-06-27 19:49 UTC · model grok-4.3

classification 💻 cs.AI

keywords large language modelsinteraction explanationscross-model consistencyinference patternsmodel interpretabilityshared interactionstoken prediction

0 comments

The pith

Large language models share interaction patterns in their internal inference when predicting the same tokens from the same prompts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether LLMs with different architectures and training histories still converge on similar internal inference steps. It applies interaction-based explanations to break down how each model arrives at a target token and measures overlap across models. The results indicate that shared patterns appear regularly, increase with model advancement, and favor lower-order interactions with reduced cancellation. A sympathetic reader would care because this points to an implicit convergence in how LLMs process language despite surface differences. The work therefore frames cross-model consistency as an observable property rather than an assumption.

Core claim

LLMs often share interaction patterns when predicting the same target token from the same prompt. This consistency is more pronounced among advanced LLMs. Shared interactions also tend to be lower-order and show weaker positive-negative cancellation than non-shared interactions. These results suggest that advanced LLMs may be implicitly optimized toward common inference patterns, even though the mechanisms that give rise to such cross-model consistency remain open.

What carries the argument

interaction-based explanations, which decompose each model's prediction into additive and higher-order interactions among input features to enable direct comparison of inference mechanisms across models

Load-bearing premise

Interaction-based explanations accurately capture and compare the internal inference mechanisms across different LLMs.

What would settle it

Applying the same interaction decomposition to a new collection of LLMs and prompts and finding no measurable overlap in the extracted interaction sets would falsify the reported consistency.

Figures

Figures reproduced from arXiv: 2606.08129 by Quanshi Zhang, Siyu Lou, Yao Yan, Yuntian Chen.

**Figure 2.** Figure 2: (a, b) Visualization of logical models extracted from LLMs. (c, d) The logical model [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: The distribution of all interactions (column (a)) can be decomposed into the distribution of shared interactions [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Comparison of the number and average contribution effect of shared interactions and non-shared interactions. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Interaction-level calibration κ among LLMs. Each node denotes an LLM, and each edge reports the fraction κ of prediction utility explained by shared interactions between two LLMs. Relatively advanced LLMs achieve higher calibration values (60.99% − 77.46%), indicating that more than 2/3 of their prediction scores are derived from shared interactions and suggesting a common optimization direction of LLMs. S… view at source ↗

**Figure 3.** Figure 3: Interaction-level calibration among LLMs on WikiText [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗

**Figure 6.** Figure 6: Sparsity of extracted interactions across different model-dataset settings. For each setting, we merge the [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗

**Figure 7.** Figure 7: Prompt-level interaction distributions for different LLM pairs. Rows (a,b) show examples from the AdaptLLM [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗

**Figure 8.** Figure 8: Interaction-level calibration κ among LLMs. Each node denotes an LLM, and each edge represents the fraction of prediction utility explained by shared interactions between two models. Relatively advanced LLMs exhibit higher calibration, suggesting stronger cross-model consistency in their interaction patterns. The experiments are on WikiText dataset. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗

read the original abstract

Large language models (LLMs) differ in architecture, training data, and optimization procedures, yet they may still develop similar internal inference patterns. In this paper, we examine this hypothesis using interaction-based explanations. We find that LLMs often share interaction patterns when predicting the same target token from the same prompt. This consistency is more pronounced among advanced LLMs. Shared interactions also tend to be lower-order and show weaker positive-negative cancellation than non-shared interactions. These results suggest that advanced LLMs may be implicitly optimized toward common inference patterns, even though the mechanisms that give rise to such cross-model consistency remain open.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper reports that LLMs share interaction patterns on the same prompts, stronger in advanced models, but the cross-model comparison rests on an untested assumption about explanation comparability.

read the letter

The central observation is that different LLMs often align on the same interaction patterns when predicting a target token from the same prompt, with the alignment stronger among more advanced models, and the shared patterns tending to be lower-order with less positive-negative cancellation.

The work applies interaction-based explanations to this cross-model question in a straightforward way. That application is the main new element; prior work on interactions has not focused on consistency across separate LLMs.

The soft spot is the missing justification for treating the extracted interactions as directly comparable. The stress-test note flags this correctly: models differ in scale, tokenizers, and layers, yet the abstract supplies no normalization steps, ablations, or controls that would rule out method-induced artifacts. Without those, the claim that shared interactions reflect common inference mechanisms stays provisional. The paper is observational, so there are no equations or derivations to inspect for internal consistency.

This is aimed at interpretability researchers who already use interaction explanations and want to explore convergence across models. A reader in that group could extract useful hypotheses, but would need the methods section to judge whether the results survive the comparability issue.

It deserves a serious referee because the question is relevant to alignment and interpretability work and the approach is distinct enough to merit checking the details. Recommendation: send to review rather than desk reject, with the expectation that reviewers will press on the cross-model validation.

Referee Report

2 major / 2 minor

Summary. The paper examines the hypothesis that LLMs develop similar internal inference patterns despite differences in architecture, training data, and optimization. Using interaction-based explanations, it finds that LLMs often share interaction patterns when predicting the same target token from the same prompt, with this consistency more pronounced among advanced LLMs. Shared interactions tend to be lower-order and show weaker positive-negative cancellation than non-shared interactions.

Significance. If these results hold after addressing methodological concerns, they would indicate convergence toward common inference patterns in advanced LLMs. This could advance interpretability research by providing evidence of implicit optimization across models, though the observational nature requires robust validation of the explanation method.

major comments (2)

[Abstract] Abstract: The abstract states findings but supplies no methods, sample sizes, statistical tests, or controls, so it is impossible to determine whether the data support the claim as stated.
[Methods] Methods: No evidence is provided that the interaction extraction procedure normalizes for differences in model scale, tokenizer, or layer structure; without such normalization, apparent sharing could arise from the explanation method itself rather than from shared inference. This assumption is load-bearing for the central claim.

minor comments (2)

Consider adding a table or section summarizing the specific LLMs, prompts, and sample sizes used to improve reproducibility.
[Results] Clarify the precise definition and measurement of 'lower-order' interactions and 'positive-negative cancellation' with reference to the relevant equations or algorithms.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive report. The comments highlight important points about clarity and methodological robustness. We address each major comment below, indicating planned revisions where appropriate. We believe these changes will strengthen the manuscript without altering its core claims.

read point-by-point responses

Referee: [Abstract] Abstract: The abstract states findings but supplies no methods, sample sizes, statistical tests, or controls, so it is impossible to determine whether the data support the claim as stated.

Authors: We agree that the abstract, as a concise summary, omits specific methodological details, sample sizes, and statistical tests. This is standard due to length limits, with full details provided in the Methods and Results sections of the manuscript (including prompt counts, interaction extraction steps, and consistency metrics with associated tests). To improve accessibility, we will revise the abstract to include a brief clause on the interaction-based method and mention of controls for prompt consistency. revision: partial
Referee: [Methods] Methods: No evidence is provided that the interaction extraction procedure normalizes for differences in model scale, tokenizer, or layer structure; without such normalization, apparent sharing could arise from the explanation method itself rather than from shared inference. This assumption is load-bearing for the central claim.

Authors: This is a substantive methodological concern. The current manuscript describes the interaction extraction procedure but does not explicitly demonstrate or justify normalization steps for model scale, tokenizer vocabulary differences, or layer alignment. We will add a dedicated subsection to the Methods section that provides this evidence, including any scaling factors, alignment techniques, or robustness analyses across architectures to rule out method-induced artifacts. revision: yes

Circularity Check

0 steps flagged

Observational study with no derivation chain or self-referential reductions

full rationale

The paper is an empirical observational study that reports findings on shared interaction patterns across LLMs using interaction-based explanations. The abstract and provided text contain no equations, derivations, fitted parameters presented as predictions, or self-citations that serve as load-bearing premises for a claimed result. The central claims are direct empirical observations (e.g., consistency more pronounced among advanced LLMs) rather than any mathematical reduction that collapses to the inputs by construction. This is the most common honest outcome for non-derivational work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no equations, parameters, or explicit assumptions to audit.

pith-pipeline@v0.9.1-grok · 5627 in / 882 out tokens · 17283 ms · 2026-06-27T19:49:23.318964+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

48 extracted references · 15 canonical work pages · 13 internal anchors

[1]

Phi-4 Technical Report

Marah Abdin, Jyoti Aneja, Harkirat Behl, Sébastien Bubeck, Ronen Eldan, Suriya Gunasekar, Michael Har- rison, Russell J Hewett, Mojan Javaheripi, Piero Kauffmann, et al. Phi-4 technical report.arXiv preprint arXiv:2412.08905, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[2]

Qwen 2.5: A comprehensive review of the leading resource-efficient llm with potentioal to surpass all competitors

Imtiaz Ahmed, Sadman Islam, Partha Protim Datta, Imran Kabir, Md Naseef Ur Rahman Chowdhury, and Ahshanul Haque. Qwen 2.5: A comprehensive review of the leading resource-efficient llm with potentioal to surpass all competitors. 2025

2025
[3]

The Falcon Series of Open Language Models

Ebtesam Almazrouei, Hamza Alobeidli, Abdulaziz Alshamsi, Alessandro Cappelli, Ruxandra Cojocaru, Mérouane Debbah, Étienne Goffinet, Daniel Hesslow, Julien Launay, Quentin Malartic, et al. The falcon series of open language models.arXiv preprint arXiv:2311.16867, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[4]

Qwen Technical Report

Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, et al. Qwen technical report.arXiv preprint arXiv:2309.16609, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[5]

On the dangers of stochastic parrots: Can language models be too big? InProceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 610–623, 2021

Emily M Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. On the dangers of stochastic parrots: Can language models be too big? InProceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 610–623, 2021

2021
[6]

Holistic evaluation of language models.Annals of the New York Academy of Sciences, 1525(1):140–146, 2023

Rishi Bommasani, Percy Liang, and Tony Lee. Holistic evaluation of language models.Annals of the New York Academy of Sciences, 1525(1):140–146, 2023

2023
[7]

Proxyspex: Inference-efficient interpretability via sparse feature interactions in llms.Advances in Neural Information Processing Systems, 38:72306–72340, 2026

Landon Butler, Abhineet Agarwal, Justin Kang, Yigit Efe Erginbas, Bin Yu, and Kannan Ramchandran. Proxyspex: Inference-efficient interpretability via sparse feature interactions in llms.Advances in Neural Information Processing Systems, 38:72306–72340, 2026

2026
[8]

Defining and extracting generalizable interaction primitives from dnns

Lu Chen, Siyu Lou, Benhao Huang, and Quanshi Zhang. Defining and extracting generalizable interaction primitives from dnns. InInternational Conference on Learning Representations, volume 2024, pages 23780– 23802, 2024

2024
[9]

Convfinqa: Exploring the chain of numerical reasoning in conversational finance question answering

Zhiyu Chen, Shiyang Li, Charese Smiley, Zhiqiang Ma, Sameena Shah, and William Yang Wang. Convfinqa: Exploring the chain of numerical reasoning in conversational finance question answering. InProceedings of the 2022 conference on empirical methods in natural language processing, pages 6279–6292, 2022

2022
[10]

Adapting large language models via reading comprehension

Daixuan Cheng, Shaohan Huang, and Furu Wei. Adapting large language models via reading comprehension. In International Conference on Learning Representations, volume 2024, pages 48624–48652, 2024. 10 APREPRINT- JUNE9, 2026

2024
[11]

Discovering and explaining the representation bottleneck of dnns

Huiqi Deng, Qihan Ren, Hao Zhang, and Quanshi Zhang. Discovering and explaining the representation bottleneck of dnns. InInternational Conference on Learning Representations, 2022

2022
[12]

Unifying fourteen post-hoc attribution methods with taylor interactions.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(7):4625–4640, 2024

Huiqi Deng, Na Zou, Mengnan Du, Weifu Chen, Guocan Feng, Ziwei Yang, Zheyang Li, and Quanshi Zhang. Unifying fourteen post-hoc attribution methods with taylor interactions.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(7):4625–4640, 2024

2024
[13]

Bert: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019

2019
[14]

ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

Team Glm, Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Dan Zhang, Diego Rojas, Guanyu Feng, Hanlin Zhao, et al. Chatglm: A family of large language models from glm-130b to glm-4 all tools.arXiv preprint arXiv:2406.12793, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[15]

The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[16]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[17]

Towards the three-phase dynamics of generalization power of a dnn.arXiv e-prints, pages arXiv–2505, 2025

Yuxuan He, Junpeng Zhang, Hongyuan Zhang, and Quanshi Zhang. Towards the three-phase dynamics of generalization power of a dnn.arXiv e-prints, pages arXiv–2505, 2025

2025
[18]

Binyuan Hui, Jian Yang, Zeyu Cui, Jiaxi Yang, Dayiheng Liu, Lei Zhang, Tianyu Liu, Jiajun Zhang, Bowen Yu, Keming Lu, et al. Qwen2. 5-coder technical report.arXiv preprint arXiv:2409.12186, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[19]

Gemma 3 Technical Report

Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Mate- jovicova, Alexandre Ramé, Morgane Rivière, Louis Rouillard, et al. Gemma 3 technical report.arXiv preprint arXiv:2503.19786, 4, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[20]

Learning to understand: Identifying interactions via the möbius transform.Advances in Neural Information Processing Systems, 37:46160–46202, 2024

Justin S Kang, Yigit E Erginbas, Landon Butler, Ramtin Pedarsani, and Kannan Ramchandran. Learning to understand: Identifying interactions via the möbius transform.Advances in Neural Information Processing Systems, 37:46160–46202, 2024

2024
[21]

Spex: Scaling feature interaction explanations for llms.arXiv preprint arXiv:2502.13870, 2025

Justin Singh Kang, Landon Butler, Abhineet Agarwal, Yigit Efe Erginbas, Ramtin Pedarsani, Kannan Ramchan- dran, and Bin Yu. Spex: Scaling feature interaction explanations for llms.arXiv preprint arXiv:2502.13870, 2025

work page arXiv 2025
[22]

Does a neural network really encode symbolic concepts? InInternational conference on machine learning, pages 20452–20469

Mingjie Li and Quanshi Zhang. Does a neural network really encode symbolic concepts? InInternational conference on machine learning, pages 20452–20469. PMLR, 2023

2023
[23]

Towards the difficulty for a deep neural network to learn concepts of different complexities.Advances in Neural Information Processing Systems, 36:41283–41304, 2023

Dongrui Liu, Huiqi Deng, Xu Cheng, Qihan Ren, Kangrui Wang, and Quanshi Zhang. Towards the difficulty for a deep neural network to learn concepts of different complexities.Advances in Neural Information Processing Systems, 36:41283–41304, 2023

2023
[24]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized bert pretraining approach.arXiv preprint arXiv:1907.11692, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1907
[25]

Www’18 open challenge: financial opinion mining and question answering

Macedo Maia, Siegfried Handschuh, André Freitas, Brian Davis, Ross McDermott, Manel Zarrouk, and Alexandra Balahur. Www’18 open challenge: financial opinion mining and question answering. InCompanion proceedings of the the web conference 2018, pages 1941–1942, 2018

2018
[26]

Good debt or bad debt: Detecting semantic orientations in economic texts.Journal of the Association for Information Science and Technology, 65 (4):782–796, 2014

Pekka Malo, Ankur Sinha, Pekka Korhonen, Jyrki Wallenius, and Pyry Takala. Good debt or bad debt: Detecting semantic orientations in economic texts.Journal of the Association for Information Science and Technology, 65 (4):782–796, 2014

2014
[27]

Pointer Sentinel Mixture Models

Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. Pointer sentinel mixture models.arXiv preprint arXiv:1609.07843, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[28]

Training language models to follow instructions with human feedback.Advances in neural information processing systems, 35:27730–27744, 2022

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback.Advances in neural information processing systems, 35:27730–27744, 2022

2022
[29]

Towards a unified game-theoretic view of adversarial perturbations and robustness.Advances in Neural Information Processing Systems, 34:3797–3810, 2021

Jie Ren, Die Zhang, Yisen Wang, Lu Chen, Zhanpeng Zhou, Yiting Chen, Xu Cheng, Xin Wang, Meng Zhou, Jie Shi, et al. Towards a unified game-theoretic view of adversarial perturbations and robustness.Advances in Neural Information Processing Systems, 34:3797–3810, 2021. 11 APREPRINT- JUNE9, 2026

2021
[30]

Defining and quantifying the emergence of sparse concepts in dnns

Jie Ren, Mingjie Li, Qirui Chen, Huiqi Deng, and Quanshi Zhang. Defining and quantifying the emergence of sparse concepts in dnns. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 20280–20289, 2023

2023
[31]

Bayesian neural networks avoid encoding complex and perturbation-sensitive concepts

Qihan Ren, Huiqi Deng, Yunuo Chen, Siyu Lou, and Quanshi Zhang. Bayesian neural networks avoid encoding complex and perturbation-sensitive concepts. InInternational Conference on Machine Learning, pages 28889– 28913. PMLR, 2023

2023
[32]

Where we have arrived in proving the emergence of sparse interaction primitives in dnns

Qihan Ren, Jiayang Gao, Wen Shen, and Quanshi Zhang. Where we have arrived in proving the emergence of sparse interaction primitives in dnns. InThe Twelfth International Conference on Learning Representations, 2024

2024
[33]

Towards the dynamics of a dnn learning symbolic interactions.Advances in Neural Information Processing Systems, 37:50653–50688, 2024

Qihan Ren, Junpeng Zhang, Yang Xu, Yue Xin, Dongrui Liu, and Quanshi Zhang. Towards the dynamics of a dnn learning symbolic interactions.Advances in Neural Information Processing Systems, 37:50653–50688, 2024

2024
[34]

On the foundations of combinatorial theory: I

Gian-Carlo Rota. On the foundations of combinatorial theory: I. theory of möbius functions. InClassic Papers in Combinatorics, pages 332–360. Springer, 1964

1964
[35]

Finer-ord: financial named entity recognition open research dataset.arXiv preprint arXiv:2302.11157, 2023

Agam Shah, Abhinav Gullapalli, Ruchit Vithani, Michael Galarnyk, and Sudheer Chava. Finer-ord: financial named entity recognition open research dataset.arXiv preprint arXiv:2302.11157, 2023

work page arXiv 2023
[36]

Talking about large language models.Communications of the ACM, 67(2):68–79, 2024

Murray Shanahan. Talking about large language models.Communications of the ACM, 67(2):68–79, 2024

2024
[37]

Sentfin 1.0: Entity-aware sentiment analysis for financial news.Journal of the Association for Information Science and Technology, 73(9):1314–1335, 2022

Ankur Sinha, Satishwar Kedas, Rishu Kumar, and Pekka Malo. Sentfin 1.0: Entity-aware sentiment analysis for financial news.Journal of the Association for Information Science and Technology, 73(9):1314–1335, 2022

2022
[38]

Understanding and sharing intentions: The origins of cultural cognition.Behavioral and brain sciences, 28(5):675–691, 2005

Michael Tomasello, Malinda Carpenter, Josep Call, Tanya Behne, and Henrike Moll. Understanding and sharing intentions: The origins of cultural cognition.Behavioral and brain sciences, 28(5):675–691, 2005

2005
[39]

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[40]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[41]

A unified approach to interpreting and boosting adversarial transferability

Xin Wang, Jie Ren, Shuyun Lin, Xiangming Zhu, Yisen Wang, and Quanshi Zhang. A unified approach to interpreting and boosting adversarial transferability. InInternational Conference on Learning Representations, 2021

2021
[42]

Qwen3 Technical Report

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[43]

Explaining generalization power of a dnn using interactive concepts

Huilin Zhou, Hao Zhang, Huiqi Deng, Dongrui Liu, Wen Shen, Shih-Han Chan, and Quanshi Zhang. Explaining generalization power of a dnn using interactive concepts. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 17105–17113, 2024

2024
[44]

Towards the first principles of explaining dnns: interactions explain the learning dynamics.Frontiers of Information Technology & Electronic Engineering, 26(7): 1017–1026, 2025

Huilin Zhou, Qihan Ren, Junpeng Zhang, and Quanshi Zhang. Towards the first principles of explaining dnns: interactions explain the learning dynamics.Frontiers of Information Technology & Electronic Engineering, 26(7): 1017–1026, 2025

2025
[45]

General scales unlock ai evaluation with explanatory and predictive power.Nature, 652(8108):58–67, 2026

Lexin Zhou, Lorenzo Pacchiardi, Fernando Martínez-Plumed, Katherine M Collins, Yael Moros-Daval, Seraphina Zhang, Qinlin Zhao, Yitian Huang, Luning Sun, Jonathan E Prunty, et al. General scales unlock ai evaluation with explanatory and predictive power.Nature, 652(8108):58–67, 2026. 12 APREPRINT- JUNE9, 2026 A Sparsity Property of Interactions Ren et al. ...

2026
[46]

That is, interactions involving too many input variables have zero or negligible contribution to the output

Bounded interaction order.The network does not rely on extremely high-order interactions. That is, interactions involving too many input variables have zero or negligible contribution to the output. Formally, there exists an order threshold M such that the interaction effect I(S) vanishes for any subset S with |S|> M
[47]

Specifically, let ¯u(m) denote the average output change when m variables are revealed, compared with the fully masked baseline: ¯u(m) =E S⊆N,|S|=m [v(xS)−v(x ∅)]

Monotonic response under masking.When more input variables are masked, the average network response decreases monotonically. Specifically, let ¯u(m) denote the average output change when m variables are revealed, compared with the fully masked baseline: ¯u(m) =E S⊆N,|S|=m [v(xS)−v(x ∅)]. Then, form ′ < m, the average response satisfies ¯u(m′) ≤¯u(m). This...
[48]

In particular, for anym ′ < m, there exists a positive constantp >0such that ¯u(m′) ≥ m′ m p ¯u(m)

Polynomial lower bound on average response.The average response does not decay too sharply when fewer variables are revealed. In particular, for anym ′ < m, there exists a positive constantp >0such that ¯u(m′) ≥ m′ m p ¯u(m). This polynomial lower bound rules out the case where the model output is dominated by dense, extremely high-order interactions. Tog...

2026

[1] [1]

Phi-4 Technical Report

Marah Abdin, Jyoti Aneja, Harkirat Behl, Sébastien Bubeck, Ronen Eldan, Suriya Gunasekar, Michael Har- rison, Russell J Hewett, Mojan Javaheripi, Piero Kauffmann, et al. Phi-4 technical report.arXiv preprint arXiv:2412.08905, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[2] [2]

Qwen 2.5: A comprehensive review of the leading resource-efficient llm with potentioal to surpass all competitors

Imtiaz Ahmed, Sadman Islam, Partha Protim Datta, Imran Kabir, Md Naseef Ur Rahman Chowdhury, and Ahshanul Haque. Qwen 2.5: A comprehensive review of the leading resource-efficient llm with potentioal to surpass all competitors. 2025

2025

[3] [3]

The Falcon Series of Open Language Models

Ebtesam Almazrouei, Hamza Alobeidli, Abdulaziz Alshamsi, Alessandro Cappelli, Ruxandra Cojocaru, Mérouane Debbah, Étienne Goffinet, Daniel Hesslow, Julien Launay, Quentin Malartic, et al. The falcon series of open language models.arXiv preprint arXiv:2311.16867, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[4] [4]

Qwen Technical Report

Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, et al. Qwen technical report.arXiv preprint arXiv:2309.16609, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[5] [5]

On the dangers of stochastic parrots: Can language models be too big? InProceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 610–623, 2021

Emily M Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. On the dangers of stochastic parrots: Can language models be too big? InProceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 610–623, 2021

2021

[6] [6]

Holistic evaluation of language models.Annals of the New York Academy of Sciences, 1525(1):140–146, 2023

Rishi Bommasani, Percy Liang, and Tony Lee. Holistic evaluation of language models.Annals of the New York Academy of Sciences, 1525(1):140–146, 2023

2023

[7] [7]

Proxyspex: Inference-efficient interpretability via sparse feature interactions in llms.Advances in Neural Information Processing Systems, 38:72306–72340, 2026

Landon Butler, Abhineet Agarwal, Justin Kang, Yigit Efe Erginbas, Bin Yu, and Kannan Ramchandran. Proxyspex: Inference-efficient interpretability via sparse feature interactions in llms.Advances in Neural Information Processing Systems, 38:72306–72340, 2026

2026

[8] [8]

Defining and extracting generalizable interaction primitives from dnns

Lu Chen, Siyu Lou, Benhao Huang, and Quanshi Zhang. Defining and extracting generalizable interaction primitives from dnns. InInternational Conference on Learning Representations, volume 2024, pages 23780– 23802, 2024

2024

[9] [9]

Convfinqa: Exploring the chain of numerical reasoning in conversational finance question answering

Zhiyu Chen, Shiyang Li, Charese Smiley, Zhiqiang Ma, Sameena Shah, and William Yang Wang. Convfinqa: Exploring the chain of numerical reasoning in conversational finance question answering. InProceedings of the 2022 conference on empirical methods in natural language processing, pages 6279–6292, 2022

2022

[10] [10]

Adapting large language models via reading comprehension

Daixuan Cheng, Shaohan Huang, and Furu Wei. Adapting large language models via reading comprehension. In International Conference on Learning Representations, volume 2024, pages 48624–48652, 2024. 10 APREPRINT- JUNE9, 2026

2024

[11] [11]

Discovering and explaining the representation bottleneck of dnns

Huiqi Deng, Qihan Ren, Hao Zhang, and Quanshi Zhang. Discovering and explaining the representation bottleneck of dnns. InInternational Conference on Learning Representations, 2022

2022

[12] [12]

Unifying fourteen post-hoc attribution methods with taylor interactions.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(7):4625–4640, 2024

Huiqi Deng, Na Zou, Mengnan Du, Weifu Chen, Guocan Feng, Ziwei Yang, Zheyang Li, and Quanshi Zhang. Unifying fourteen post-hoc attribution methods with taylor interactions.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(7):4625–4640, 2024

2024

[13] [13]

Bert: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019

2019

[14] [14]

ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

Team Glm, Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Dan Zhang, Diego Rojas, Guanyu Feng, Hanlin Zhao, et al. Chatglm: A family of large language models from glm-130b to glm-4 all tools.arXiv preprint arXiv:2406.12793, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[15] [15]

The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[16] [16]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[17] [17]

Towards the three-phase dynamics of generalization power of a dnn.arXiv e-prints, pages arXiv–2505, 2025

Yuxuan He, Junpeng Zhang, Hongyuan Zhang, and Quanshi Zhang. Towards the three-phase dynamics of generalization power of a dnn.arXiv e-prints, pages arXiv–2505, 2025

2025

[18] [18]

Binyuan Hui, Jian Yang, Zeyu Cui, Jiaxi Yang, Dayiheng Liu, Lei Zhang, Tianyu Liu, Jiajun Zhang, Bowen Yu, Keming Lu, et al. Qwen2. 5-coder technical report.arXiv preprint arXiv:2409.12186, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[19] [19]

Gemma 3 Technical Report

Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Mate- jovicova, Alexandre Ramé, Morgane Rivière, Louis Rouillard, et al. Gemma 3 technical report.arXiv preprint arXiv:2503.19786, 4, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[20] [20]

Learning to understand: Identifying interactions via the möbius transform.Advances in Neural Information Processing Systems, 37:46160–46202, 2024

Justin S Kang, Yigit E Erginbas, Landon Butler, Ramtin Pedarsani, and Kannan Ramchandran. Learning to understand: Identifying interactions via the möbius transform.Advances in Neural Information Processing Systems, 37:46160–46202, 2024

2024

[21] [21]

Spex: Scaling feature interaction explanations for llms.arXiv preprint arXiv:2502.13870, 2025

Justin Singh Kang, Landon Butler, Abhineet Agarwal, Yigit Efe Erginbas, Ramtin Pedarsani, Kannan Ramchan- dran, and Bin Yu. Spex: Scaling feature interaction explanations for llms.arXiv preprint arXiv:2502.13870, 2025

work page arXiv 2025

[22] [22]

Does a neural network really encode symbolic concepts? InInternational conference on machine learning, pages 20452–20469

Mingjie Li and Quanshi Zhang. Does a neural network really encode symbolic concepts? InInternational conference on machine learning, pages 20452–20469. PMLR, 2023

2023

[23] [23]

Towards the difficulty for a deep neural network to learn concepts of different complexities.Advances in Neural Information Processing Systems, 36:41283–41304, 2023

Dongrui Liu, Huiqi Deng, Xu Cheng, Qihan Ren, Kangrui Wang, and Quanshi Zhang. Towards the difficulty for a deep neural network to learn concepts of different complexities.Advances in Neural Information Processing Systems, 36:41283–41304, 2023

2023

[24] [24]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized bert pretraining approach.arXiv preprint arXiv:1907.11692, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1907

[25] [25]

Www’18 open challenge: financial opinion mining and question answering

Macedo Maia, Siegfried Handschuh, André Freitas, Brian Davis, Ross McDermott, Manel Zarrouk, and Alexandra Balahur. Www’18 open challenge: financial opinion mining and question answering. InCompanion proceedings of the the web conference 2018, pages 1941–1942, 2018

2018

[26] [26]

Good debt or bad debt: Detecting semantic orientations in economic texts.Journal of the Association for Information Science and Technology, 65 (4):782–796, 2014

Pekka Malo, Ankur Sinha, Pekka Korhonen, Jyrki Wallenius, and Pyry Takala. Good debt or bad debt: Detecting semantic orientations in economic texts.Journal of the Association for Information Science and Technology, 65 (4):782–796, 2014

2014

[27] [27]

Pointer Sentinel Mixture Models

Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. Pointer sentinel mixture models.arXiv preprint arXiv:1609.07843, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[28] [28]

Training language models to follow instructions with human feedback.Advances in neural information processing systems, 35:27730–27744, 2022

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback.Advances in neural information processing systems, 35:27730–27744, 2022

2022

[29] [29]

Towards a unified game-theoretic view of adversarial perturbations and robustness.Advances in Neural Information Processing Systems, 34:3797–3810, 2021

Jie Ren, Die Zhang, Yisen Wang, Lu Chen, Zhanpeng Zhou, Yiting Chen, Xu Cheng, Xin Wang, Meng Zhou, Jie Shi, et al. Towards a unified game-theoretic view of adversarial perturbations and robustness.Advances in Neural Information Processing Systems, 34:3797–3810, 2021. 11 APREPRINT- JUNE9, 2026

2021

[30] [30]

Defining and quantifying the emergence of sparse concepts in dnns

Jie Ren, Mingjie Li, Qirui Chen, Huiqi Deng, and Quanshi Zhang. Defining and quantifying the emergence of sparse concepts in dnns. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 20280–20289, 2023

2023

[31] [31]

Bayesian neural networks avoid encoding complex and perturbation-sensitive concepts

Qihan Ren, Huiqi Deng, Yunuo Chen, Siyu Lou, and Quanshi Zhang. Bayesian neural networks avoid encoding complex and perturbation-sensitive concepts. InInternational Conference on Machine Learning, pages 28889– 28913. PMLR, 2023

2023

[32] [32]

Where we have arrived in proving the emergence of sparse interaction primitives in dnns

Qihan Ren, Jiayang Gao, Wen Shen, and Quanshi Zhang. Where we have arrived in proving the emergence of sparse interaction primitives in dnns. InThe Twelfth International Conference on Learning Representations, 2024

2024

[33] [33]

Towards the dynamics of a dnn learning symbolic interactions.Advances in Neural Information Processing Systems, 37:50653–50688, 2024

Qihan Ren, Junpeng Zhang, Yang Xu, Yue Xin, Dongrui Liu, and Quanshi Zhang. Towards the dynamics of a dnn learning symbolic interactions.Advances in Neural Information Processing Systems, 37:50653–50688, 2024

2024

[34] [34]

On the foundations of combinatorial theory: I

Gian-Carlo Rota. On the foundations of combinatorial theory: I. theory of möbius functions. InClassic Papers in Combinatorics, pages 332–360. Springer, 1964

1964

[35] [35]

Finer-ord: financial named entity recognition open research dataset.arXiv preprint arXiv:2302.11157, 2023

Agam Shah, Abhinav Gullapalli, Ruchit Vithani, Michael Galarnyk, and Sudheer Chava. Finer-ord: financial named entity recognition open research dataset.arXiv preprint arXiv:2302.11157, 2023

work page arXiv 2023

[36] [36]

Talking about large language models.Communications of the ACM, 67(2):68–79, 2024

Murray Shanahan. Talking about large language models.Communications of the ACM, 67(2):68–79, 2024

2024

[37] [37]

Sentfin 1.0: Entity-aware sentiment analysis for financial news.Journal of the Association for Information Science and Technology, 73(9):1314–1335, 2022

Ankur Sinha, Satishwar Kedas, Rishu Kumar, and Pekka Malo. Sentfin 1.0: Entity-aware sentiment analysis for financial news.Journal of the Association for Information Science and Technology, 73(9):1314–1335, 2022

2022

[38] [38]

Understanding and sharing intentions: The origins of cultural cognition.Behavioral and brain sciences, 28(5):675–691, 2005

Michael Tomasello, Malinda Carpenter, Josep Call, Tanya Behne, and Henrike Moll. Understanding and sharing intentions: The origins of cultural cognition.Behavioral and brain sciences, 28(5):675–691, 2005

2005

[39] [39]

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[40] [40]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[41] [41]

A unified approach to interpreting and boosting adversarial transferability

Xin Wang, Jie Ren, Shuyun Lin, Xiangming Zhu, Yisen Wang, and Quanshi Zhang. A unified approach to interpreting and boosting adversarial transferability. InInternational Conference on Learning Representations, 2021

2021

[42] [42]

Qwen3 Technical Report

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[43] [43]

Explaining generalization power of a dnn using interactive concepts

Huilin Zhou, Hao Zhang, Huiqi Deng, Dongrui Liu, Wen Shen, Shih-Han Chan, and Quanshi Zhang. Explaining generalization power of a dnn using interactive concepts. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 17105–17113, 2024

2024

[44] [44]

Towards the first principles of explaining dnns: interactions explain the learning dynamics.Frontiers of Information Technology & Electronic Engineering, 26(7): 1017–1026, 2025

Huilin Zhou, Qihan Ren, Junpeng Zhang, and Quanshi Zhang. Towards the first principles of explaining dnns: interactions explain the learning dynamics.Frontiers of Information Technology & Electronic Engineering, 26(7): 1017–1026, 2025

2025

[45] [45]

General scales unlock ai evaluation with explanatory and predictive power.Nature, 652(8108):58–67, 2026

Lexin Zhou, Lorenzo Pacchiardi, Fernando Martínez-Plumed, Katherine M Collins, Yael Moros-Daval, Seraphina Zhang, Qinlin Zhao, Yitian Huang, Luning Sun, Jonathan E Prunty, et al. General scales unlock ai evaluation with explanatory and predictive power.Nature, 652(8108):58–67, 2026. 12 APREPRINT- JUNE9, 2026 A Sparsity Property of Interactions Ren et al. ...

2026

[46] [46]

That is, interactions involving too many input variables have zero or negligible contribution to the output

Bounded interaction order.The network does not rely on extremely high-order interactions. That is, interactions involving too many input variables have zero or negligible contribution to the output. Formally, there exists an order threshold M such that the interaction effect I(S) vanishes for any subset S with |S|> M

[47] [47]

Specifically, let ¯u(m) denote the average output change when m variables are revealed, compared with the fully masked baseline: ¯u(m) =E S⊆N,|S|=m [v(xS)−v(x ∅)]

Monotonic response under masking.When more input variables are masked, the average network response decreases monotonically. Specifically, let ¯u(m) denote the average output change when m variables are revealed, compared with the fully masked baseline: ¯u(m) =E S⊆N,|S|=m [v(xS)−v(x ∅)]. Then, form ′ < m, the average response satisfies ¯u(m′) ≤¯u(m). This...

[48] [48]

In particular, for anym ′ < m, there exists a positive constantp >0such that ¯u(m′) ≥ m′ m p ¯u(m)

Polynomial lower bound on average response.The average response does not decay too sharply when fewer variables are revealed. In particular, for anym ′ < m, there exists a positive constantp >0such that ¯u(m′) ≥ m′ m p ¯u(m). This polynomial lower bound rules out the case where the model output is dominated by dense, extremely high-order interactions. Tog...

2026