arxiv: 2605.01164 · v1 · submitted 2026-05-01 · 💻 cs.AI

Recognition: unknown

LLMs Should Not Yet Be Credited with Decision Explanation

Wenshuo Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-09 18:38 UTC · model grok-4.3

classification 💻 cs.AI

keywords LLMsdecision explanationrationale generationhuman decision modelingexplanatory standardscredit calibrationposition paper

0 comments

The pith

LLMs should not yet be credited with explaining human decisions, as evidence only supports prediction and rationale generation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper distinguishes three claims with different standards: LLMs can predict what decisions people make, generate plausible rationales for those decisions, and sometimes produce explanatory hypotheses. It claims that typical evidence from LLMs meets the first two but fails to show genuine decision explanation rather than rationalization that fits the predictions after the fact. A reader would care because treating rationales as explanations risks redefining explanatory progress in models of human behavior and could delay better tools. The paper offers a bridge standard requiring explanatory claims to name targets, rule out weaker alternatives, apply process-sensitive tests, and limit scope. This keeps LLMs useful for prediction and hypothesis work without overstating their explanatory reach.

Core claim

The central claim is that LLMs should not yet be credited with decision explanation. Evidence most commonly offered for LLM-based decision accounts directly supports decision prediction and rationale generation, and sometimes explanatory hypothesis generation, but does not distinguish decision explanation from prediction-supportive rationalization. Stronger explanatory credit requires a bridge standard: claims must specify explanatory targets, discriminate against weaker rationalizer alternatives, use target-appropriate process- or intervention-sensitive validation, and bound their scope. Adopting a principle of credit calibration ensures LLMs are credited only for the strongest claim their

What carries the argument

The three-claim distinction (decision prediction, rationale generation, decision explanation) and the bridge standard that sets conditions for granting explanatory credit.

If this is right

LLMs can still be credited as predictors of decisions and generators of rationales without claiming explanatory power.
Explanatory claims about LLM outputs must name specific targets and apply intervention-sensitive tests to be accepted.
Adopting credit calibration preserves LLMs as instruments for hypothesis generation while avoiding premature redefinition of explanation.
Related work in human decision modeling can continue using LLMs for prediction tasks while developing separate standards for explanation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Researchers might design new experiments that directly test whether LLMs can meet the bridge standards under controlled process manipulations.
The position could extend to other AI applications where prediction accuracy is conflated with explanatory insight, such as medical diagnosis or policy recommendation.
If adopted, the calibration principle might encourage hybrid systems that pair LLMs with process-tracing methods from psychology.

Load-bearing premise

The distinction between generating rationales that support predictions and providing genuine explanations based on decision processes is meaningful and currently unbridged by available evidence.

What would settle it

A concrete demonstration in which an LLM specifies a clear explanatory target, rules out rationalization alternatives via process interventions, passes target-appropriate validation, and bounds its scope would show that current evidence suffices for explanatory credit.

read the original abstract

This position paper argues that LLMs should not yet be credited with decision explanation. This matters because recent work increasingly treats accurate behavioral prediction, plausible rationales, and outcome-conditioned reasoning traces as evidence that LLMs explain why people decide as they do, risking a premature redefinition of what counts as explanatory progress in human decision modeling. We first distinguish three claims with different evidential burdens: decision prediction, rationale generation, and decision explanation. We then argue that the evidence most commonly offered for LLM-based decision accounts directly supports the first two claims, and sometimes explanatory hypothesis generation, but does not distinguish decision explanation from prediction-supportive rationalization. Next, we propose a bridge standard for decision-explanation credit: stronger claims should specify explanatory targets, discriminate against weaker rationalizer alternatives, use target-appropriate process- or intervention-sensitive validation, and bound their scope. We then situate this standard against competing views and related literatures, clarifying why it preserves the value of LLMs as predictors, narrators, and hypothesis generators while resisting premature explanatory credit. We conclude with a principle of credit calibration: LLMs should be credited for the strongest claim their evidence warrants, and no stronger; if adopted, this principle can help turn LLMs from persuasive narrators of decisions into more reliable instruments for discovering, testing, and communicating explanations of human behavior.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. This position paper argues that LLMs should not yet be credited with decision explanation. It distinguishes three claims with distinct evidential burdens—decision prediction, rationale generation, and decision explanation—and contends that common evidence (accurate behavioral prediction, plausible rationales, outcome-conditioned traces) supports the first two and sometimes hypothesis generation but does not establish process-level explanation over prediction-supportive rationalization. The authors propose a four-part bridge standard (specify explanatory targets, discriminate against weaker alternatives, apply target-appropriate process- or intervention-sensitive validation, bound scope) drawn from philosophy of science and psychology, situate it against competing views, and conclude with a credit-calibration principle that LLMs should receive credit only for the strongest claim their evidence warrants.

Significance. If the distinctions and standards hold, the paper provides a useful conceptual framework for calibrating claims in LLM-assisted human decision modeling. It explicitly credits LLMs' strengths as predictors, narrators, and hypothesis generators while resisting over-attribution of explanatory power, which could help maintain rigor in cognitive modeling and AI applications. The argument is internally consistent, avoids circularity, and offers falsifiable criteria for future work to meet explanatory credit.

minor comments (2)

[Abstract] The abstract previews the four-part standard clearly but could name the four elements in a single sentence for quicker reader orientation.
[Introduction] Section headings and subsection numbering are consistent, but a short table summarizing the three claims and their evidential burdens would improve scannability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive and accurate summary of the manuscript, their assessment of its significance for calibrating claims in LLM-assisted decision modeling, and their recommendation to accept. We are pleased that the distinctions, bridge standard, and credit-calibration principle were found internally consistent and useful.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is a conceptual position piece that distinguishes three claims (decision prediction, rationale generation, decision explanation) by their differing evidential requirements, then motivates a four-part bridge standard by reference to external philosophy-of-science and psychology literatures. No equations, fitted parameters, or self-referential definitions appear; the credit-calibration principle follows directly from the distinctions drawn rather than reducing to any input by construction. All load-bearing steps cite independent sources or logical analysis, satisfying the self-contained criterion.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on assumed distinctions in what counts as explanation versus rationalization in decision modeling, without derivation from data or formal proof.

axioms (2)

domain assumption Decision prediction, rationale generation, and decision explanation are distinct claims requiring different levels of evidence.
Invoked at the start to separate the three claims and assign evidential burdens.
domain assumption Current LLM evidence supports prediction and rationales but fails to rule out rationalization as an alternative to explanation.
Core premise used to argue against crediting explanation.

pith-pipeline@v0.9.0 · 5526 in / 1359 out tokens · 36416 ms · 2026-05-09T18:38:41.283081+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 31 canonical work pages · 4 internal anchors

[1]

I., and Kalai, A

Aher, G., Arriaga, R. I., and Kalai, A. T. (2023). Using large language models to simulate multiple humans and replicate human subject studies. InProceedings of the 40th International Conference on Machine Learning, volume 202 ofProceedings of Machine Learning Research, pages 337–371

2023
[2]

Flexible Coding of in-depth Interviews: A Twenty- rst Century Approach

Argyle, L. P., Busby, E. C., Fulda, N., Gubler, J. R., Rytting, C., and Wingate, D. (2023). Out of one, many: Using language models to simulate human samples.Political Analysis, 31(3):337–351. doi:10.1017/pan.2023.2

work page doi:10.1017/pan.2023.2 2023
[3]

J., Filippas, A., and Manning, B

Horton, J. J., Filippas, A., and Manning, B. S. (2023). Large language models as simulated economic agents: What can we learn fromHomo Silicus?NBER Working Paper31122. doi:10.3386/w31122

work page doi:10.3386/w31122 2023
[4]

Binz, M., Akata, E., Bethge, M., et al. (2025). A foundation model to predict and capture human cognition. Nature, 644(8078):1002–1009. doi:10.1038/s41586-025-09215-4

work page doi:10.1038/s41586-025-09215-4 2025
[5]

C., and Griffiths, T

Zhu, J.-Q., Xie, H., Arumugam, D., Wilson, R. C., and Griffiths, T. L. (2025). Using reinforcement learning to train large language models to explain human decisions. arXiv preprint arXiv:2505.11614

work page arXiv 2025
[6]

and Pashler, H

Roberts, S. and Pashler, H. (2000). How persuasive is a good fit? A comment on theory testing.Psycholog- ical Review, 107(2):358–367. doi:10.1037/0033-295X.107.2.358

work page doi:10.1037/0033-295x.107.2.358 2000
[7]

Shmueli, G. (2010). To explain or to predict?Statistical Science, 25(3):289–310. doi:10.1214/10-STS330

work page doi:10.1214/10-sts330 2010
[8]

M., Watts, D

Hofman, J. M., Watts, D. J., Athey, S., Garip, F., Griffiths, T. L., Kleinberg, J., Margetts, H., Mullainathan, S., Salganik, M. J., Vazire, S., Vespignani, A., and Yarkoni, T. (2021). Integrating explanation and prediction in computational social science.Nature, 595(7866):181–188. doi:10.1038/s41586-021-03659-0

work page doi:10.1038/s41586-021-03659-0 2021
[9]

(2003).Making Things Happen: A Theory of Causal Explanation

Woodward, J. (2003).Making Things Happen: A Theory of Causal Explanation. Oxford University Press

2003
[10]

(2009).Causality: Models, Reasoning, and Inference

Pearl, J. (2009).Causality: Models, Reasoning, and Inference. 2nd edition. Cambridge University Press

2009
[11]

Peters, J., Bühlmann, P., and Meinshausen, N. (2016). Causal inference using invariant prediction: Identification and confidence intervals.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78(5):947–1012. doi:10.1111/rssb.12167

work page doi:10.1111/rssb.12167 2016
[12]

Nisbett, R. E. and Wilson, T. D. (1977). Telling more than we can know: Verbal reports on mental processes.Psychological Review, 84(3):231–259. doi:10.1037/0033-295X.84.3.231

work page doi:10.1037/0033-295x.84.3.231 1977
[13]

Johansson, P., Hall, L., Sikström, S., and Olsson, A. (2005). Failure to detect mismatches between intention and outcome in a simple decision task.Science, 310(5745):116–119. doi:10.1126/science.1111709

work page doi:10.1126/science.1111709 2005
[14]

17 Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, and Denny Zhou

Jacovi, A. and Goldberg, Y . (2020). Towards faithfully interpretable NLP systems: How should we define and evaluate faithfulness? InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4198–4205. doi:10.18653/v1/2020.acl-main.386

work page doi:10.18653/v1/2020.acl-main.386 2020
[15]

Turpin, M., Michael, J., Perez, E., and Bowman, S. R. (2023). Language models don’t always say what they think: Unfaithful explanations in chain-of-thought prompting. InAdvances in Neural Information Processing Systems, 36:74952–74965

2023
[16]

H., Le, Q

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E. H., Le, Q. V ., and Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. InAdvances in Neural Information Processing Systems, 35:24824–24837

2022
[17]

Lanham, T., Chen, A., Radhakrishnan, A., et al. (2023). Measuring faithfulness in chain-of-thought reasoning. arXiv preprint arXiv:2307.13702

work page Pith review arXiv 2023
[18]

Paul, D., West, R., Bosselut, A., and Faltings, B. (2024). Making reasoning matter: Measuring and improving faithfulness of chain-of-thought reasoning. InFindings of the Association for Computational Linguistics: EMNLP 2024, pages 15012–15032. doi:10.18653/v1/2024.findings-emnlp.882

work page doi:10.18653/v1/2024.findings-emnlp.882 2024
[19]

Yu, Q., Tartaglini, A., Hase, P., Guestrin, C., and Potts, C. (2026). Outcome rewards do not guarantee verifiable or causally important reasoning.arXiv preprint arXiv:2604.22074

work page internal anchor Pith review Pith/arXiv arXiv 2026
[20]

C., Bourgin, D

Peterson, J. C., Bourgin, D. D., Agrawal, M., Reichman, D., and Griffiths, T. L. (2021). Using large-scale ex- periments and machine learning to discover theories of human decision-making.Science, 372(6547):1209–

2021
[21]

doi:10.1126/science.abe2629

work page doi:10.1126/science.abe2629
[22]

C., and Griffiths, T

Reichman, D., Peterson, J. C., and Griffiths, T. L. (2024). Machine learning for modeling human decisions. Decision, 11(4):619–632. doi:10.1037/dec0000242. 10

work page doi:10.1037/dec0000242 2024
[23]

C., Reichman, D., Griffiths, T

Plonsky, O., Apel, R., Ert, E., Tennenholtz, M., Bourgin, D., Peterson, J. C., Reichman, D., Griffiths, T. L., Russell, S. J., Carter, E. C., Cavanagh, J. F., and Erev, I. (2025). Predicting human decisions with be- havioural theories and machine learning.Nature Human Behaviour, 9(11):2271–2284. doi:10.1038/s41562- 025-02267-6

work page doi:10.1038/s41562- 2025
[24]

and Schulz, E

Binz, M. and Schulz, E. (2024). Turning large language models into cognitive models. InThe Twelfth International Conference on Learning Representations

2024
[25]

and Westfall, J

Yarkoni, T. and Westfall, J. (2017). Choosing prediction over explanation in psychology: Lessons from ma- chine learning.Perspectives on Psychological Science, 12(6):1100–1122. doi:10.1177/1745691617693393

work page doi:10.1177/1745691617693393 2017
[26]

C., Sucholutsky, I., and Griffiths, T

Liu, R., Geng, J., Peterson, J. C., Sucholutsky, I., and Griffiths, T. L. (2025). Large language models assume people are more rational than we really are. InThe Thirteenth International Conference on Learning Representations

2025
[27]

N., Jamale, K., and Gonzalez, C

Nguyen, T. N., Jamale, K., and Gonzalez, C. (2024). Predicting and understanding human action decisions: Insights from large language models and cognitive instance-based learning.Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, 12(1):126–136. doi:10.1609/hcomp.v12i1.31607

work page doi:10.1609/hcomp.v12i1.31607 2024
[28]

Feng, Y ., Choudhary, V ., and Shrestha, Y . R. (2025). Noise, adaptation, and strategy: Assessing LLM fidelity in decision-making. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 7693–7706. doi:10.18653/v1/2025.emnlp-main.391

work page doi:10.18653/v1/2025.emnlp-main.391 2025
[29]

Miller, T. (2019). Explanation in artificial intelligence: Insights from the social sciences.Artificial Intelligence, 267:1–38. doi:10.1016/j.artint.2018.07.007

work page doi:10.1016/j.artint.2018.07.007 2019
[30]

Towards A Rigorous Science of Interpretable Machine Learning

Doshi-Velez, F. and Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608

work page internal anchor Pith review arXiv 2017
[31]

Ziems, C., Held, W., Shaikh, O., Chen, J., Zhang, Z., and Yang, D. (2024). Can large language models trans- form computational social science?Computational Linguistics, 50(1):237–291. doi:10.1162/coli_a_00502

work page doi:10.1162/coli_a_00502 2024
[32]

Ericsson, K. A. and Simon, H. A. (1980). Verbal reports as data.Psychological Review, 87(3):215–251. doi:10.1037/0033-295X.87.3.215

work page doi:10.1037/0033-295x.87.3.215 1980
[33]

Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.Nature Machine Intelligence, 1(5):206–215. doi:10.1038/s42256-019-0048-x

work page doi:10.1038/s42256-019-0048-x 2019
[34]

Lyu, Q., Havaldar, S., Stein, A., Zhang, L., Rao, D., Wong, E., Apidianaki, M., and Callison-Burch, C. (2023). Faithful chain-of-thought reasoning. InProceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Paper...

work page doi:10.18653/v1/2023.ijcnlp- 2023
[35]

Reasoning Models Don't Always Say What They Think

Chen, Y ., Benton, J., Radhakrishnan, A., Uesato, J., Denison, C., Schulman, J., Somani, A., Hase, P., Wagner, M., Roger, F., Mikulik, V ., Bowman, S. R., Leike, J., Kaplan, J., and Perez, E. (2025). Reasoning models don’t always say what they think. arXiv preprint arXiv:2505.05410

work page internal anchor Pith review arXiv 2025
[36]

Datta, A., Zhao, Z., Verma, B., Mamidi, R., Marreddy, M., and Mehler, A. (2026). Large language models decide early and explain later. arXiv preprint arXiv:2604.22266

work page internal anchor Pith review Pith/arXiv arXiv 2026
[37]

Vig, J., Gehrmann, S., Belinkov, Y ., Qian, S., Nevo, D., Singer, Y ., and Shieber, S. M. (2020). Causal mediation analysis for interpreting neural NLP: The case of gender bias. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4512–4528. doi:10.18653/v1/2020.emnlp-main.363

work page doi:10.18653/v1/2020.emnlp-main.363 2020
[38]

2023 , month = jan, journal =

Geiger, A., Ibeling, D., Zur, A., Chaudhary, M., Chauhan, S., Huang, J., Arora, A., Wu, Z., Goodman, N. D., Potts, C., and Icard, T. (2023). Causal abstraction: A theoretical foundation for mechanistic interpretability. arXiv preprint arXiv:2301.04709

work page arXiv 2023
[39]

Toward causal representation learning.Proceedings of the IEEE, 109(5):612–634, 2021

Schölkopf, B., Locatello, F., Bauer, S., Ke, N. R., Kalchbrenner, N., Goyal, A., and Bengio, Y . (2021). Toward causal representation learning.Proceedings of the IEEE, 109(5):612–634. doi:10.1109/JPROC.2021.3058954. 11

work page doi:10.1109/jproc.2021.3058954 2021