Recognition: unknown
LLMs Should Not Yet Be Credited with Decision Explanation
Pith reviewed 2026-05-09 18:38 UTC · model grok-4.3
The pith
LLMs should not yet be credited with explaining human decisions, as evidence only supports prediction and rationale generation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that LLMs should not yet be credited with decision explanation. Evidence most commonly offered for LLM-based decision accounts directly supports decision prediction and rationale generation, and sometimes explanatory hypothesis generation, but does not distinguish decision explanation from prediction-supportive rationalization. Stronger explanatory credit requires a bridge standard: claims must specify explanatory targets, discriminate against weaker rationalizer alternatives, use target-appropriate process- or intervention-sensitive validation, and bound their scope. Adopting a principle of credit calibration ensures LLMs are credited only for the strongest claim their
What carries the argument
The three-claim distinction (decision prediction, rationale generation, decision explanation) and the bridge standard that sets conditions for granting explanatory credit.
If this is right
- LLMs can still be credited as predictors of decisions and generators of rationales without claiming explanatory power.
- Explanatory claims about LLM outputs must name specific targets and apply intervention-sensitive tests to be accepted.
- Adopting credit calibration preserves LLMs as instruments for hypothesis generation while avoiding premature redefinition of explanation.
- Related work in human decision modeling can continue using LLMs for prediction tasks while developing separate standards for explanation.
Where Pith is reading between the lines
- Researchers might design new experiments that directly test whether LLMs can meet the bridge standards under controlled process manipulations.
- The position could extend to other AI applications where prediction accuracy is conflated with explanatory insight, such as medical diagnosis or policy recommendation.
- If adopted, the calibration principle might encourage hybrid systems that pair LLMs with process-tracing methods from psychology.
Load-bearing premise
The distinction between generating rationales that support predictions and providing genuine explanations based on decision processes is meaningful and currently unbridged by available evidence.
What would settle it
A concrete demonstration in which an LLM specifies a clear explanatory target, rules out rationalization alternatives via process interventions, passes target-appropriate validation, and bounds its scope would show that current evidence suffices for explanatory credit.
read the original abstract
This position paper argues that LLMs should not yet be credited with decision explanation. This matters because recent work increasingly treats accurate behavioral prediction, plausible rationales, and outcome-conditioned reasoning traces as evidence that LLMs explain why people decide as they do, risking a premature redefinition of what counts as explanatory progress in human decision modeling. We first distinguish three claims with different evidential burdens: decision prediction, rationale generation, and decision explanation. We then argue that the evidence most commonly offered for LLM-based decision accounts directly supports the first two claims, and sometimes explanatory hypothesis generation, but does not distinguish decision explanation from prediction-supportive rationalization. Next, we propose a bridge standard for decision-explanation credit: stronger claims should specify explanatory targets, discriminate against weaker rationalizer alternatives, use target-appropriate process- or intervention-sensitive validation, and bound their scope. We then situate this standard against competing views and related literatures, clarifying why it preserves the value of LLMs as predictors, narrators, and hypothesis generators while resisting premature explanatory credit. We conclude with a principle of credit calibration: LLMs should be credited for the strongest claim their evidence warrants, and no stronger; if adopted, this principle can help turn LLMs from persuasive narrators of decisions into more reliable instruments for discovering, testing, and communicating explanations of human behavior.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This position paper argues that LLMs should not yet be credited with decision explanation. It distinguishes three claims with distinct evidential burdens—decision prediction, rationale generation, and decision explanation—and contends that common evidence (accurate behavioral prediction, plausible rationales, outcome-conditioned traces) supports the first two and sometimes hypothesis generation but does not establish process-level explanation over prediction-supportive rationalization. The authors propose a four-part bridge standard (specify explanatory targets, discriminate against weaker alternatives, apply target-appropriate process- or intervention-sensitive validation, bound scope) drawn from philosophy of science and psychology, situate it against competing views, and conclude with a credit-calibration principle that LLMs should receive credit only for the strongest claim their evidence warrants.
Significance. If the distinctions and standards hold, the paper provides a useful conceptual framework for calibrating claims in LLM-assisted human decision modeling. It explicitly credits LLMs' strengths as predictors, narrators, and hypothesis generators while resisting over-attribution of explanatory power, which could help maintain rigor in cognitive modeling and AI applications. The argument is internally consistent, avoids circularity, and offers falsifiable criteria for future work to meet explanatory credit.
minor comments (2)
- [Abstract] The abstract previews the four-part standard clearly but could name the four elements in a single sentence for quicker reader orientation.
- [Introduction] Section headings and subsection numbering are consistent, but a short table summarizing the three claims and their evidential burdens would improve scannability.
Simulated Author's Rebuttal
We thank the referee for their positive and accurate summary of the manuscript, their assessment of its significance for calibrating claims in LLM-assisted decision modeling, and their recommendation to accept. We are pleased that the distinctions, bridge standard, and credit-calibration principle were found internally consistent and useful.
Circularity Check
No significant circularity
full rationale
The paper is a conceptual position piece that distinguishes three claims (decision prediction, rationale generation, decision explanation) by their differing evidential requirements, then motivates a four-part bridge standard by reference to external philosophy-of-science and psychology literatures. No equations, fitted parameters, or self-referential definitions appear; the credit-calibration principle follows directly from the distinctions drawn rather than reducing to any input by construction. All load-bearing steps cite independent sources or logical analysis, satisfying the self-contained criterion.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Decision prediction, rationale generation, and decision explanation are distinct claims requiring different levels of evidence.
- domain assumption Current LLM evidence supports prediction and rationales but fails to rule out rationalization as an alternative to explanation.
Reference graph
Works this paper leans on
-
[1]
I., and Kalai, A
Aher, G., Arriaga, R. I., and Kalai, A. T. (2023). Using large language models to simulate multiple humans and replicate human subject studies. InProceedings of the 40th International Conference on Machine Learning, volume 202 ofProceedings of Machine Learning Research, pages 337–371
2023
-
[2]
Flexible Coding of in-depth Interviews: A Twenty- rst Century Approach
Argyle, L. P., Busby, E. C., Fulda, N., Gubler, J. R., Rytting, C., and Wingate, D. (2023). Out of one, many: Using language models to simulate human samples.Political Analysis, 31(3):337–351. doi:10.1017/pan.2023.2
-
[3]
J., Filippas, A., and Manning, B
Horton, J. J., Filippas, A., and Manning, B. S. (2023). Large language models as simulated economic agents: What can we learn fromHomo Silicus?NBER Working Paper31122. doi:10.3386/w31122
-
[4]
Binz, M., Akata, E., Bethge, M., et al. (2025). A foundation model to predict and capture human cognition. Nature, 644(8078):1002–1009. doi:10.1038/s41586-025-09215-4
-
[5]
Zhu, J.-Q., Xie, H., Arumugam, D., Wilson, R. C., and Griffiths, T. L. (2025). Using reinforcement learning to train large language models to explain human decisions. arXiv preprint arXiv:2505.11614
-
[6]
Roberts, S. and Pashler, H. (2000). How persuasive is a good fit? A comment on theory testing.Psycholog- ical Review, 107(2):358–367. doi:10.1037/0033-295X.107.2.358
-
[7]
Shmueli, G. (2010). To explain or to predict?Statistical Science, 25(3):289–310. doi:10.1214/10-STS330
-
[8]
Hofman, J. M., Watts, D. J., Athey, S., Garip, F., Griffiths, T. L., Kleinberg, J., Margetts, H., Mullainathan, S., Salganik, M. J., Vazire, S., Vespignani, A., and Yarkoni, T. (2021). Integrating explanation and prediction in computational social science.Nature, 595(7866):181–188. doi:10.1038/s41586-021-03659-0
-
[9]
(2003).Making Things Happen: A Theory of Causal Explanation
Woodward, J. (2003).Making Things Happen: A Theory of Causal Explanation. Oxford University Press
2003
-
[10]
(2009).Causality: Models, Reasoning, and Inference
Pearl, J. (2009).Causality: Models, Reasoning, and Inference. 2nd edition. Cambridge University Press
2009
-
[11]
Peters, J., Bühlmann, P., and Meinshausen, N. (2016). Causal inference using invariant prediction: Identification and confidence intervals.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78(5):947–1012. doi:10.1111/rssb.12167
-
[12]
Nisbett, R. E. and Wilson, T. D. (1977). Telling more than we can know: Verbal reports on mental processes.Psychological Review, 84(3):231–259. doi:10.1037/0033-295X.84.3.231
-
[13]
Johansson, P., Hall, L., Sikström, S., and Olsson, A. (2005). Failure to detect mismatches between intention and outcome in a simple decision task.Science, 310(5745):116–119. doi:10.1126/science.1111709
-
[14]
Jacovi, A. and Goldberg, Y . (2020). Towards faithfully interpretable NLP systems: How should we define and evaluate faithfulness? InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4198–4205. doi:10.18653/v1/2020.acl-main.386
-
[15]
Turpin, M., Michael, J., Perez, E., and Bowman, S. R. (2023). Language models don’t always say what they think: Unfaithful explanations in chain-of-thought prompting. InAdvances in Neural Information Processing Systems, 36:74952–74965
2023
-
[16]
H., Le, Q
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E. H., Le, Q. V ., and Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. InAdvances in Neural Information Processing Systems, 35:24824–24837
2022
-
[17]
Lanham, T., Chen, A., Radhakrishnan, A., et al. (2023). Measuring faithfulness in chain-of-thought reasoning. arXiv preprint arXiv:2307.13702
work page Pith review arXiv 2023
-
[18]
Paul, D., West, R., Bosselut, A., and Faltings, B. (2024). Making reasoning matter: Measuring and improving faithfulness of chain-of-thought reasoning. InFindings of the Association for Computational Linguistics: EMNLP 2024, pages 15012–15032. doi:10.18653/v1/2024.findings-emnlp.882
-
[19]
Yu, Q., Tartaglini, A., Hase, P., Guestrin, C., and Potts, C. (2026). Outcome rewards do not guarantee verifiable or causally important reasoning.arXiv preprint arXiv:2604.22074
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[20]
C., Bourgin, D
Peterson, J. C., Bourgin, D. D., Agrawal, M., Reichman, D., and Griffiths, T. L. (2021). Using large-scale ex- periments and machine learning to discover theories of human decision-making.Science, 372(6547):1209–
2021
-
[21]
doi:10.1126/science.abe2629
-
[22]
Reichman, D., Peterson, J. C., and Griffiths, T. L. (2024). Machine learning for modeling human decisions. Decision, 11(4):619–632. doi:10.1037/dec0000242. 10
-
[23]
C., Reichman, D., Griffiths, T
Plonsky, O., Apel, R., Ert, E., Tennenholtz, M., Bourgin, D., Peterson, J. C., Reichman, D., Griffiths, T. L., Russell, S. J., Carter, E. C., Cavanagh, J. F., and Erev, I. (2025). Predicting human decisions with be- havioural theories and machine learning.Nature Human Behaviour, 9(11):2271–2284. doi:10.1038/s41562- 025-02267-6
-
[24]
and Schulz, E
Binz, M. and Schulz, E. (2024). Turning large language models into cognitive models. InThe Twelfth International Conference on Learning Representations
2024
-
[25]
Yarkoni, T. and Westfall, J. (2017). Choosing prediction over explanation in psychology: Lessons from ma- chine learning.Perspectives on Psychological Science, 12(6):1100–1122. doi:10.1177/1745691617693393
-
[26]
C., Sucholutsky, I., and Griffiths, T
Liu, R., Geng, J., Peterson, J. C., Sucholutsky, I., and Griffiths, T. L. (2025). Large language models assume people are more rational than we really are. InThe Thirteenth International Conference on Learning Representations
2025
-
[27]
N., Jamale, K., and Gonzalez, C
Nguyen, T. N., Jamale, K., and Gonzalez, C. (2024). Predicting and understanding human action decisions: Insights from large language models and cognitive instance-based learning.Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, 12(1):126–136. doi:10.1609/hcomp.v12i1.31607
-
[28]
Feng, Y ., Choudhary, V ., and Shrestha, Y . R. (2025). Noise, adaptation, and strategy: Assessing LLM fidelity in decision-making. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 7693–7706. doi:10.18653/v1/2025.emnlp-main.391
-
[29]
Miller, T. (2019). Explanation in artificial intelligence: Insights from the social sciences.Artificial Intelligence, 267:1–38. doi:10.1016/j.artint.2018.07.007
-
[30]
Towards A Rigorous Science of Interpretable Machine Learning
Doshi-Velez, F. and Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608
work page internal anchor Pith review arXiv 2017
-
[31]
Ziems, C., Held, W., Shaikh, O., Chen, J., Zhang, Z., and Yang, D. (2024). Can large language models trans- form computational social science?Computational Linguistics, 50(1):237–291. doi:10.1162/coli_a_00502
-
[32]
Ericsson, K. A. and Simon, H. A. (1980). Verbal reports as data.Psychological Review, 87(3):215–251. doi:10.1037/0033-295X.87.3.215
-
[33]
Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.Nature Machine Intelligence, 1(5):206–215. doi:10.1038/s42256-019-0048-x
-
[34]
Lyu, Q., Havaldar, S., Stein, A., Zhang, L., Rao, D., Wong, E., Apidianaki, M., and Callison-Burch, C. (2023). Faithful chain-of-thought reasoning. InProceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Paper...
-
[35]
Reasoning Models Don't Always Say What They Think
Chen, Y ., Benton, J., Radhakrishnan, A., Uesato, J., Denison, C., Schulman, J., Somani, A., Hase, P., Wagner, M., Roger, F., Mikulik, V ., Bowman, S. R., Leike, J., Kaplan, J., and Perez, E. (2025). Reasoning models don’t always say what they think. arXiv preprint arXiv:2505.05410
work page internal anchor Pith review arXiv 2025
-
[36]
Datta, A., Zhao, Z., Verma, B., Mamidi, R., Marreddy, M., and Mehler, A. (2026). Large language models decide early and explain later. arXiv preprint arXiv:2604.22266
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[37]
Vig, J., Gehrmann, S., Belinkov, Y ., Qian, S., Nevo, D., Singer, Y ., and Shieber, S. M. (2020). Causal mediation analysis for interpreting neural NLP: The case of gender bias. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4512–4528. doi:10.18653/v1/2020.emnlp-main.363
-
[38]
Geiger, A., Ibeling, D., Zur, A., Chaudhary, M., Chauhan, S., Huang, J., Arora, A., Wu, Z., Goodman, N. D., Potts, C., and Icard, T. (2023). Causal abstraction: A theoretical foundation for mechanistic interpretability. arXiv preprint arXiv:2301.04709
-
[39]
Toward causal representation learning.Proceedings of the IEEE, 109(5):612–634, 2021
Schölkopf, B., Locatello, F., Bauer, S., Ke, N. R., Kalchbrenner, N., Goyal, A., and Bengio, Y . (2021). Toward causal representation learning.Proceedings of the IEEE, 109(5):612–634. doi:10.1109/JPROC.2021.3058954. 11
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.