pith. machine review for the scientific record. sign in

arxiv: 2604.08016 · v2 · submitted 2026-04-09 · 💻 cs.AI · cs.LG

Recognition: no theorem link

Wiring the 'Why': A Unified Taxonomy and Survey of Abductive Reasoning in LLMs

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:53 UTC · model grok-4.3

classification 💻 cs.AI cs.LG
keywords abductive reasoninglarge language modelssurveytaxonomyhypothesis generationhypothesis selectionbenchmarkingreasoning capabilities
0
0 comments X

The pith

A unified two-stage framework organizes research on abductive reasoning in large language models into hypothesis generation and selection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper offers the first survey of how large language models handle abductive reasoning, the process of inferring the best explanation for an observation. It identifies confusion in the field due to varying definitions and tasks, and introduces a clear split: one stage where the model creates possible explanations to fill knowledge gaps, and another where it picks the best one from those options. This split allows the authors to build a taxonomy covering tasks, datasets, methods, and how performance is measured. They also test several LLMs on these tasks to see patterns across different model types and compare abduction to other forms of reasoning like deduction and induction. The work ends by noting shortcomings in current research that limit progress.

Core claim

This paper presents the first survey of abductive reasoning in LLMs, tracing its trajectory from philosophical foundations to contemporary AI implementations. To address the widespread conceptual confusion and disjointed task definitions prevalent in the field, we establish a unified two-stage definition that formally categorizes prior work. This definition disentangles abduction into Hypothesis Generation, where models bridge epistemic gaps to produce candidate explanations, and Hypothesis Selection, where the generated candidates are evaluated and the most plausible explanation is chosen. Building upon this foundation, we present a comprehensive taxonomy of the literature, categorizing the

What carries the argument

The unified two-stage definition of abductive reasoning, separating Hypothesis Generation from Hypothesis Selection, which structures the taxonomy and benchmark analysis.

If this is right

  • Previous studies on abductive reasoning in LLMs can be systematically categorized by tasks, datasets, methodologies, and evaluation strategies.
  • LLMs demonstrate distinct performance patterns when generating candidate explanations versus selecting the most plausible one.
  • Abductive reasoning performance relates to deductive and inductive reasoning capabilities, offering broader insights into model reasoning.
  • Critical gaps exist in static benchmark designs, narrow domain coverage, limited training frameworks, and insufficient mechanistic understanding.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Training methods could be designed to target hypothesis generation and selection as separate skills to enhance overall abductive performance.
  • Links between different reasoning types may support development of AI systems capable of multiple inference modes.
  • Addressing the noted gaps would require creating benchmarks with wider domains and more dynamic designs.
  • Techniques for understanding model internals could be focused on how explanations are formed during abductive tasks.

Load-bearing premise

The proposed two-stage split into Hypothesis Generation and Hypothesis Selection accurately and exhaustively organizes all prior abductive work without omitting important variants or forcing artificial boundaries on existing task definitions.

What would settle it

A significant collection of abductive reasoning research that cannot be classified into either the hypothesis generation stage or the hypothesis selection stage would falsify the completeness of the unified definition.

Figures

Figures reproduced from arXiv: 2604.08016 by Danial Parnian, Mahdi Jafari Siavoshani, Moein Salimi, Mohammad Hossein Rohban, Nima Alighardashi, Shaygan Adim.

Figure 1
Figure 1. Figure 1: Publication Trends of Abductive Reasoning Research in Computer Science. The stacked bar chart categorizes publications into LLM-based, NLP-based (Non-LLM), and other computer science approaches. each case, the crucial step is not merely applying known rules, but guessing what might be going on behind the scenes: inventing a plausible story that, if true, would make the puzzling observation unsurprising. Th… view at source ↗
Figure 2
Figure 2. Figure 2: The Abductive Reasoning Pipeline. As defined in Section 2.2.1, we conceptualize abduction as a two-stage process: (1) Hypothesis Generation, where an observation (O) triggers the retrieval or creation of a candidate set H to bridge an epistemic gap; and (2) Hypothesis Selection, where candidates are evaluated to identify the best explanation h ∗ . This survey unites works that focus on either stage individ… view at source ↗
Figure 3
Figure 3. Figure 3: Categorization of literature on abductive reasoning in LLMs. The taxonomy is divided into four [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Macro-average performance on the Stage I generation benchmarks. This figure summarizes overall [PITH_FULL_IMAGE:figures/full_fig_p022_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Macro-average performance on the Stage II selection benchmarks. This figure summarizes overall [PITH_FULL_IMAGE:figures/full_fig_p023_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Scaling trends within model families on the Stage I generation and Stage II selection benchmarks. [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: Abductive accuracy plotted against de￾ductive (blue) and inductive (green) accuracy. Each point represents a shared (model, method) configu￾ration evaluated on an abductive dataset and a cor￾responding deductive or inductive dataset. 5 Identified Gaps in Current Research 5.1 Lack of Unified Definitions One of the core challenges in this area is the lack of a shared understanding of what abductive reasoning… view at source ↗
read the original abstract

Regardless of its foundational role in human discovery and sense-making, abductive reasoning--the inference of the most plausible explanation for an observation--has been relatively underexplored in Large Language Models (LLMs). Despite the rapid advancement of LLMs, the exploration of abductive reasoning and its diverse facets has thus far been disjointed rather than cohesive. This paper presents the first survey of abductive reasoning in LLMs, tracing its trajectory from philosophical foundations to contemporary AI implementations. To address the widespread conceptual confusion and disjointed task definitions prevalent in the field, we establish a unified two-stage definition that formally categorizes prior work. This definition disentangles abduction into Hypothesis Generation, where models bridge epistemic gaps to produce candidate explanations, and Hypothesis Selection, where the generated candidates are evaluated and the most plausible explanation is chosen. Building upon this foundation, we present a comprehensive taxonomy of the literature, categorizing prior work based on their abductive tasks, datasets, underlying methodologies, and evaluation strategies. In order to ground our framework empirically, we conduct a compact benchmark study of current LLMs on abductive tasks, together with targeted comparative analyses across model sizes, model families, evaluation styles, and the distinct generation-versus-selection task typologies. Moreover, by synthesizing recent empirical results, we examine how LLM performance on abductive reasoning relates to deductive and inductive tasks, providing insights into their broader reasoning capabilities. Our analysis reveals critical gaps in current approaches--from static benchmark design and narrow domain coverage to narrow training frameworks and limited mechanistic understanding of abductive processes...

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript claims to provide the first comprehensive survey of abductive reasoning in large language models (LLMs). It traces the concept from its philosophical origins to modern AI applications, introduces a unified two-stage definition consisting of Hypothesis Generation and Hypothesis Selection to resolve conceptual confusion and disjointed task definitions, develops a taxonomy organizing the literature by abductive tasks, datasets, methodologies, and evaluation strategies, performs a compact benchmark study comparing LLMs on these tasks with analyses across model sizes, families, and task types, and synthesizes results to relate abductive reasoning to deductive and inductive capabilities while identifying gaps in benchmarks, domains, training, and mechanistic understanding.

Significance. If the proposed taxonomy and two-stage definition prove to be comprehensive and accurate, this work will serve as a foundational reference for standardizing research on abductive reasoning in LLMs. The empirical benchmark, despite being compact, offers valuable comparative insights, and the examination of relations to other reasoning types contributes to understanding broader LLM reasoning. The identification of critical gaps provides clear directions for future work. The synthesis of external literature is a strength, though the empirical component requires more detail for full impact.

minor comments (3)
  1. [Benchmark Study] The compact benchmark study lacks detailed methods, specific dataset lists, or statistical controls, limiting verifiability of the comparative analyses across model sizes, families, and generation-versus-selection typologies.
  2. A summary table mapping surveyed works to the two-stage taxonomy categories would improve readability and allow readers to quickly assess coverage.
  3. [Abstract] The abstract describes 'targeted comparative analyses' and 'synthesizing recent empirical results' without naming the models, metrics, or key quantitative findings, reducing standalone clarity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive and constructive review, which recognizes the paper's role as a foundational survey and the value of the unified taxonomy, benchmark, and gap analysis. We appreciate the recommendation for minor revision and address the single point raised regarding the empirical component below.

read point-by-point responses
  1. Referee: The empirical component requires more detail for full impact.

    Authors: We agree that expanding the description of the compact benchmark would strengthen the manuscript. In the revision, we will add: (1) explicit details on task selection criteria and prompt templates used for generation vs. selection stages; (2) full per-model performance tables with standard deviations across runs; (3) a brief error analysis categorizing failure modes by hypothesis quality; and (4) justification for the benchmark's scope relative to the taxonomy. These additions will be placed in an expanded Section 5 without altering the compact nature of the study. revision: yes

Circularity Check

0 steps flagged

No significant circularity in survey and taxonomy synthesis

full rationale

This paper is a literature survey that proposes a two-stage taxonomy (Hypothesis Generation followed by Hypothesis Selection) to organize existing abductive reasoning work in LLMs. No equations, fitted parameters, predictions, or derivations appear in the provided text or abstract. The central claims rest on synthesis and categorization of external prior literature rather than any internal reduction to the paper's own inputs or self-citations. The two-stage split is presented as an organizing framework, not as a result derived from data or prior self-work within the manuscript. This is the expected non-circular outcome for a survey paper whose contributions are classificatory rather than predictive or deductive.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a survey and taxonomy paper that synthesizes existing literature; no free parameters are fitted, no new axioms are postulated, and no invented entities are introduced beyond the descriptive two-stage framework.

pith-pipeline@v0.9.0 · 5608 in / 1207 out tokens · 35471 ms · 2026-05-10T17:53:36.313197+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

115 extracted references · 76 canonical work pages · 3 internal anchors

  1. [1]

    Aakur and Sudeep Sarkar

    Sathyanarayanan N. Aakur and Sudeep Sarkar. Leveraging symbolic knowledge bases for commonsense natural language inference using pattern theory. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45 0 (11): 0 13185--13202, 2023. doi:10.1109/TPAMI.2023.3287837

  2. [2]

    Abductive Reasoning: Logical Investigations into Discovery and Explanation, volume 330 of Synthese Library

    Atocha Aliseda. Abductive Reasoning: Logical Investigations into Discovery and Explanation, volume 330 of Synthese Library. Springer Dordrecht, 2006. ISBN 978-1-4020-3907-2. doi:10.1007/1-4020-3907-7

  3. [3]

    Alkan, Shashwat Sourav, Maja Jabłońska, Simone Astarita, Rishabh Chakrabarty, N

    A. Alkan, Shashwat Sourav, Maja Jabłońska, Simone Astarita, Rishabh Chakrabarty, N. Garuda, P. Khetarpal, Maciej Pi'oro, Dimitrios Tanoglidis, Kartheik G. Iyer, M. Polimera, Michael J. Smith, Tirthankar Ghosal, M. Huertas-Company, Sandor Kruk, Kevin Schawinski, and Ioana Ciucua. A survey on hypothesis generation for scientific discovery in the era of larg...

  4. [4]

    Advancing abductive reasoning in knowledge graphs through complex logical hypothesis generation

    Jiaxin Bai, Yicheng Wang, Tianshi Zheng, Yue Guo, Xin Liu, and Yangqiu Song. Advancing abductive reasoning in knowledge graphs through complex logical hypothesis generation. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.\ 1312--1329, Bangkok, Thailand, 2024. Association for Computati...

  5. [5]

    Steering large language model activations in sparse spaces

    Reza Bayat, Ali Rahimi-Kalahroudi, Mohammad Pezeshki, Sarath Chandar, and Pascal Vincent. Steering large language model activations in sparse spaces. In Proceedings of the Conference on Language Modeling (COLM), 2025

  6. [6]

    On relationships between induction and abduction: A logical point of view

    Brigitte Bessant. On relationships between induction and abduction: A logical point of view. In Peter A. Flach and Antonis C. Kakas (eds.), Abduction and Induction: Essays on their Relation and Integration, volume 18 of Applied Logic Series, pp.\ 77--87. Kluwer Academic Publishers, Dordrecht, 2000. doi:10.1007/978-94-017-0606-3_5

  7. [7]

    Abductive commonsense reasoning

    Chandra Bhagavatula, Ronan Le Bras, Chaitanya Malaviya, Keisuke Sakaguchi, Ari Holtzman, Hannah Rashkin, Doug Downey, Wen-tau Yih, and Yejin Choi. Abductive commonsense reasoning. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=Byg1v1HKDB

  8. [8]

    and Angeli, Gabor and Potts, Christopher and Manning, Christopher D

    Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp.\ 632--642, Lisbon, Portugal, 2015. Association for Computational Linguistics. doi:10.18653/v1/D15-1075. URL https://a...

  9. [9]

    Markus J. Buehler. In situ graph reasoning and knowledge expansion using graph-preflexor. Advanced Intelligent Discovery, 1 0 (3): 0 e202500006, 2025. doi:10.1002/aidi.202500006. URL https://doi.org/10.1002/aidi.202500006

  10. [10]

    e-snli: Natural language inference with natural language explanations

    Oana-Maria Camburu, Tim Rockt \"a schel, Thomas Lukasiewicz, and Phil Blunsom. e-snli: Natural language inference with natural language explanations. In Advances in Neural Information Processing Systems, volume 31, 2018. URL https://papers.nips.cc/paper/8163-e-snli-natural-language-inference-with-natural-language-explanations

  11. [11]

    Daniel G. Campos. On the distinction between P eirce's abduction and L ipton's inference to the best explanation. Synthese, 180: 0 419--442, 2011. doi:10.1007/s11229-009-9709-3. URL https://doi.org/10.1007/s11229-009-9709-3

  12. [12]

    Self-consistent narrative prompts on abductive natural language inference

    Chunkit Chan, Xin Liu, Tsz Ho Chan, Jiayang Cheng, Yangqiu Song, Ginny Wong, and Simon See. Self-consistent narrative prompts on abductive natural language inference. In Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (...

  13. [13]

    Abductivemllm: Boosting visual abductive reasoning within mllms

    Boyu Chang, Qi Wang, Xi Guo, Zhixiong Nan, Yazhou Yao, and Tianfei Zhou. Abductivemllm: Boosting visual abductive reasoning within mllms. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 40, pp.\ 2698--2706, 2026. doi:10.1609/aaai.v40i4.37258. URL https://ojs.aaai.org/index.php/AAAI/article/view/37258

  14. [14]

    Beyond brainstorming: What drives high-quality scientific ideas? lessons from multi-agent collaboration

    Nuo Chen, Yicheng Tong, Jiaying Wu, Minh Duc Duong, Qian Wang, Qingyun Zou, Bryan Hooi, and Bingsheng He. Beyond brainstorming: What drives high-quality scientific ideas? lessons from multi-agent collaboration. arXiv preprint arXiv:2508.04575, 2025. doi:10.48550/arXiv.2508.04575. URL https://arxiv.org/abs/2508.04575

  15. [15]

    What’s in the image? a deep-dive into the vision of vision language models

    Aditya Chinchure, Sahithya Ravi, Raymond Ng, Vered Shwartz, Boyang Li, and Leonid Sigal. Black swan: Abductive and defeasible video reasoning in unpredictable events. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 24201--24210, 2025. doi:10.1109/CVPR52734.2025.02254. URL https://blackswan-video.github.io/

  16. [16]

    On the Measure of Intelligence

    François Chollet. On the measure of intelligence, 2019. URL https://arxiv.org/abs/1911.01547

  17. [17]

    Transformers as soft reasoners over language

    Peter Clark, Oyvind Tafjord, and Kyle Richardson. Transformers as soft reasoners over language. In Christian Bessiere (ed.), Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20 , pp.\ 3882--3890. International Joint Conferences on Artificial Intelligence Organization, 7 2020. doi:10.24963/ijcai.2020/537. URL...

  18. [18]

    Mavor-Parker, Aengus Lynch, Stefan Heimersheim, and Adri \`a Garriga-Alonso

    Arthur Conmy, Augustine N. Mavor-Parker, Aengus Lynch, Stefan Heimersheim, and Adri \`a Garriga-Alonso. Towards automated circuit discovery for mechanistic interpretability. In Advances in Neural Information Processing Systems, volume 36, 2023

  19. [19]

    Inference to the best explanation in large language models

    Dhairya Dalal, Marco Valentino, Andre Freitas, and Paul Buitelaar. Inference to the best explanation in large language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.\ 217--235, Bangkok, Thailand, 2024. Association for Computational Linguistics. doi:10.18653/v1/2024.acl-long.1...

  20. [20]

    True detective: A deep abductive reasoning benchmark undoable for GPT -3 and challenging for GPT -4

    Maksym Del and Mark Fishel. True detective: A deep abductive reasoning benchmark undoable for GPT -3 and challenging for GPT -4. In Proceedings of the 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023), pp.\ 314--322, Toronto, Canada, 2023. Association for Computational Linguistics. doi:10.18653/v1/2023.starsem-1.28. URL https://acla...

  21. [21]

    Abductive Reasoning in Science

    Finnur Dells \'e n. Abductive Reasoning in Science. Elements in Philosophy of Science. Cambridge University Press, June 2024. ISBN 9781009500524. doi:10.1017/9781009353199

  22. [22]

    Delrieux

    C. Delrieux. Abductive inference in defeasible reasoning: a model for research programmes. Journal of Applied Logic, 2 0 (4): 0 409--437, 2004. doi:10.1016/j.jal.2004.07.003

  23. [23]

    Assessing the reasoning capabilities of LLM s in the context of evidence-based claim verification

    John Dougrez-Lewis, Mahmud Elahi Akhter, Federico Ruggeri, Sebastian L \"o bbers, Yulan He, and Maria Liakata. Assessing the reasoning capabilities of LLM s in the context of evidence-based claim verification. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar (eds.), Findings of the Association for Computational Linguistics: A...

  24. [24]

    Abduction

    Igor Douven. Abduction . In Edward N. Zalta and Uri Nodelman (eds.), The Stanford Encyclopedia of Philosophy . Metaphysics Research Lab, Stanford University, W inter 2025 edition, 2025

  25. [25]

    Pan, Sylvia Wang, Kunxun Qi, Yuming Shen, and Yu Deng

    Jianfeng Du, Jeff Z. Pan, Sylvia Wang, Kunxun Qi, Yuming Shen, and Yu Deng. Validation of growing knowledge graphs by abductive text evidences. Proceedings of the AAAI Conference on Artificial Intelligence, 33 0 (01): 0 2784--2791, Jul. 2019. doi:10.1609/aaai.v33i01.33012784. URL https://ojs.aaai.org/index.php/AAAI/article/view/4130

  26. [26]

    e- CARE : a new dataset for exploring explainable causal reasoning

    Li Du, Xiao Ding, Kai Xiong, Ting Liu, and Bing Qin. e- CARE : a new dataset for exploring explainable causal reasoning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.\ 432--446, Dublin, Ireland, 2022. Association for Computational Linguistics. doi:10.18653/v1/2022.acl-long.33. URL h...

  27. [27]

    Hwang, Maxwell Forbes, and Yejin Choi

    Denis Emelin, Ronan Le Bras, Jena D. Hwang, Maxwell Forbes, and Yejin Choi. Moral stories: Situated reasoning about norms, intents, actions, and their consequences. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp.\ 698--718, Onli...

  28. [28]

    Flach and Antonis C

    Peter A. Flach and Antonis C. Kakas. Abductive and inductive reasoning: Background and issues. In Peter A. Flach and Antonis C. Kakas (eds.), Abduction and Induction: Essays on their Relation and Integration, volume 18 of Applied Logic Series, pp.\ 1--27. Kluwer Academic Publishers, Dordrecht, 2000. doi:10.1007/978-94-017-0606-3_1

  29. [29]

    Frankfurt

    Harry G. Frankfurt. Peirce's notion of abduction. The Journal of Philosophy, 55 0 (14): 0 593--597, 1958. doi:10.2307/2021966. URL https://doi.org/10.2307/2021966

  30. [30]

    Patterson, Matthew Churpek, Tim- othy Miller, Dmitriy Dligach, and Majid Afshar

    Yanjun Gao, Ruizhe Li, Emma Croxford, John Caskey, Brian W Patterson, Matthew Churpek, Timothy Miller, Dmitriy Dligach, and Majid Afshar. Leveraging medical knowledge graphs into large language models for diagnosis prediction: Design and application study. JMIR AI, 4: 0 e58670, February 2025. ISSN 2817-1705. doi:10.2196/58670. URL http://dx.doi.org/10.2196/58670

  31. [31]

    Unifying deductive and abductive reasoning in knowledge graphs with masked diffusion model

    Yisen Gao, Jiaxin Bai, Yi Huang, Xingcheng Fu, Qingyun Sun, and Yangqiu Song. Unifying deductive and abductive reasoning in knowledge graphs with masked diffusion model. In Proceedings of the ACM Web Conference 2026 (WWW '26), Dubai, United Arab Emirates, 2026 a . Association for Computing Machinery. doi:10.1145/3774904.3792133. URL https://doi.org/10.114...

  32. [32]

    Controllable logical hypothesis generation for abductive reasoning in knowledge graphs

    Yisen Gao, Jiaxin Bai, Tianshi Zheng, Ziwei Zhang, Qingyun Sun, Xingcheng Fu, Jianxin Li, and Yangqiu Song. Controllable logical hypothesis generation for abductive reasoning in knowledge graphs. In International Conference on Learning Representations, 2026 b . URL https://openreview.net/forum?id=oTgJg0M9kY. Poster

  33. [33]

    The third PASCAL recognizing textual entailment challenge

    Danilo Giampiccolo, Bernardo Magnini, Ido Dagan, and Bill Dolan. The third PASCAL recognizing textual entailment challenge. In Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, pp.\ 1--9, Prague, 2007. Association for Computational Linguistics. URL https://aclanthology.org/W07-1401/

  34. [34]

    Available: http://dx.doi.org/10.1038/s41586-025-09422-z

    Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. Nature, 2025. doi:10.1038/s41586-025-09422-z. URL https://www.nature.com/articles/s41586-025-09422-z

  35. [35]

    Whodunit: Evaluation benchmark for culprit detection in mystery stories, 2025

    Kshitij Gupta. Whodunit: Evaluation benchmark for culprit detection in mystery stories, 2025. URL https://arxiv.org/abs/2502.07747

  36. [36]

    Patterns of Discovery: An Inquiry into the Conceptual Foundations of Science

    Norwood Russell Hanson. Patterns of Discovery: An Inquiry into the Conceptual Foundations of Science. Cambridge University Press, Cambridge, 1958

  37. [37]

    The inference to the best explanation

    Gilbert Harman. The inference to the best explanation. Philosophical Review, 74 0 (1): 0 88--95, 1965. doi:10.2307/2183532

  38. [38]

    and Lu, F

    Jinwei He and Feng Lu. Causejudger: Identifying the cause with llms for abductive logical reasoning, 2024. URL https://arxiv.org/abs/2409.05559

  39. [39]

    From reasoning to learning: A survey on hypothesis discovery and rule learning with large language models

    Kaiyu He and Zhiyu Chen. From reasoning to learning: A survey on hypothesis discovery and rule learning with large language models. Transactions on Machine Learning Research, 2025. URL https://openreview.net/forum?id=d7W38UzUg0

  40. [40]

    Gear: A general evaluation framework for abductive reasoning, 2025 a

    Kaiyu He, Peilin Wu, Mian Zhang, Kun Wan, Wentian Zhao, Xinya Du, and Zhiyu Chen. Gear: A general evaluation framework for abductive reasoning, 2025 a . URL https://arxiv.org/abs/2509.24096

  41. [41]

    Idea: Enhancing the rule learning ability of large language model agents through induction, deduction, and abduction

    Kaiyu He, Mian Zhang, Shuo Yan, Peilin Wu, and Zhiyu Chen. Idea: Enhancing the rule learning ability of large language model agents through induction, deduction, and abduction. In Findings of the Association for Computational Linguistics: ACL 2025, pp.\ 13563--13597, Vienna, Austria, 2025 b . Association for Computational Linguistics. doi:10.18653/v1/2025...

  42. [42]

    LEGO : A multi-agent collaborative framework with role-playing and iterative feedback for causality explanation generation

    Zhitao He, Pengfei Cao, Yubo Chen, Kang Liu, Ruopeng Li, Mengshu Sun, and Jun Zhao. LEGO : A multi-agent collaborative framework with role-playing and iterative feedback for causality explanation generation. In Findings of the Association for Computational Linguistics: EMNLP 2023, pp.\ 9142--9163, Singapore, 2023. Association for Computational Linguistics...

  43. [43]

    Hwang, Jae Sung Park, Rowan Zellers, Chandra Bhagavatula, Anna Rohrbach, Kate Saenko, and Yejin Choi

    Jack Hessel, Jena D. Hwang, Jae Sung Park, Rowan Zellers, Chandra Bhagavatula, Anna Rohrbach, Kate Saenko, and Yejin Choi. The abduction of sherlock holmes: A dataset for visual abductive reasoning. In Computer Vision -- ECCV 2022, volume 13696 of Lecture Notes in Computer Science, pp.\ 558--575. Springer, Cham, 2022. doi:10.1007/978-3-031-20059-5_32. URL...

  44. [44]

    A implies b: Circuit analysis in llms for propositional logical reasoning

    Guan Zhe Hong, Nishanth Dikkala, Enming Luo, Cyrus Rashtchian, Xin Wang, and Rina Panigrahy. A implies b: Circuit analysis in llms for propositional logical reasoning. In Advances in Neural Information Processing Systems, 2025

  45. [45]

    Argmed-agents: Explainable clinical decision reasoning with large language models via argumentation schemes

    Shengxin Hong, Liang Xiao, Xin Zhang, and Jianxia Chen. Argmed-agents: Explainable clinical decision reasoning with large language models via argumentation schemes. In 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp.\ 1989--1996. IEEE, 2024. doi:10.1109/BIBM62325.2024.10822109. Also available as arXiv:2403.06294

  46. [46]

    Disentangling logic: The role of context in large language models' formal reasoning capabilities

    Wenyue Hua, Kaijie Zhu, Lingyao Li, Lizhou Fan, Mingyu Jin, Shuhang Lin, Haochen Xue, Zelong Li, Jindong Wang, and Yongfeng Zhang. Disentangling logic: The role of context in large language models' formal reasoning capabilities. In Findings of the Association for Computational Linguistics: ACL 2025, pp.\ 19219--19242, Vienna, Austria, 2025. Association fo...

  47. [47]

    The relation of P eirce's abduction to inference to the best explanation

    Yi Jiang. The relation of P eirce's abduction to inference to the best explanation. Chinese Semiotic Studies, 20 0 (3): 0 485--496, 2024. doi:10.1515/css-2024-2022

  48. [48]

    Kakas and Loizos Michael

    A. Kakas and Loizos Michael. Abduction and argumentation for explainable machine learning: A position survey, 2020

  49. [49]

    Peirce and the autonomy of abductive reasoning

    Tomis Kapitan. Peirce and the autonomy of abductive reasoning. Erkenntnis, 37: 0 1--26, 1992

  50. [50]

    Minsu Kim and James Thorne. Epistemology of language models: Do language models have holistic knowledge? In Lun-Wei Ku, Andre Martins, and Vivek Srikumar (eds.), Findings of the Association for Computational Linguistics: ACL 2024, pp.\ 12644--12669, Bangkok, Thailand, August 2024. Association for Computational Linguistics. doi:10.18653/v1/2024.findings-ac...

  51. [51]

    Playgrounds for abstraction and reasoning

    Subin Kim, Prin Phunyaphibarn, Donghyun Ahn, and Sundong Kim. Playgrounds for abstraction and reasoning. In NeurIPS 2022 Workshop on Neuro Causal and Symbolic AI (nCSI), 2022. URL https://openreview.net/forum?id=F4RNpByoqP

  52. [52]

    arXiv preprint arXiv:2403.00745 , year=

    J \'a nos Kram \'a r, Tom Lieberum, Rohin Shah, and Neel Nanda. Atp*: An efficient and scalable method for localizing llm behaviour to components. arXiv preprint arXiv:2403.00745, 2024

  53. [53]

    Multi-modal action chain abductive reasoning (mar)

    Mengze Li, Tianbao Wang, Jiahe Xu, Kairong Han, Shengyu Zhang, Zhou Zhao, Jiaxu Miao, Wenqiao Zhang, Shiliang Pu, and Fei Wu. Multi-modal action chain abductive reasoning (mar). In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023), pp.\ 4617--4628, Toronto, Canada, 2023. Association for Computational Lingui...

  54. [54]

    From hypothesis to premises: Llm-based backward logical reasoning with selective symbolic translation

    Qingchuan Li, Mingyue Cheng, Zirui Liu, Daoyu Wang, Yuting Zeng, and Tongxuan Liu. From hypothesis to premises: Llm-based backward logical reasoning with selective symbolic translation. Proceedings of the AAAI Conference on Artificial Intelligence, 40 0 (37): 0 31671--31679, 2026. doi:10.1609/aaai.v40i37.40434. URL https://ojs.aaai.org/index.php/AAAI/arti...

  55. [55]

    MViTv2: Improved Multiscale Vision Transformers for Classification and Detection , isbn =

    Chen Liang, Wenguan Wang, Tianfei Zhou, and Yi Yang. Visual abductive reasoning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 15544--15554, 2022. doi:10.1109/CVPR52688.2022.01512. URL https://openaccess.thecvf.com/content/CVPR2022/html/Liang_Visual_Abductive_Reasoning_CVPR_2022_paper.html

  56. [56]

    URL https://doi.org/10

    Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Wang, Yan Wang, Rui Wang, Yujiu Yang, Shuming Shi, and Zhaopeng Tu. Encouraging divergent thinking in large language models through multi-agent debate. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pp.\ 17889--17904, Miami, Florida, USA, 2024. Association for Computati...

  57. [57]

    Let's verify step by step

    Hunter Lightman, Vineet Kosaraju, Yuri Burda, Harrison Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. Let's verify step by step. In International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=v8L0pN6EOi. Poster

  58. [58]

    Abductive inference in retrieval-augmented language models: Generating and validating missing premises, 2025

    Shiyin Lin. Abductive inference in retrieval-augmented language models: Generating and validating missing premises, 2025. URL https://arxiv.org/abs/2511.04020

  59. [59]

    Inference to the Best Explanation

    Peter Lipton. Inference to the Best Explanation. Routledge, London, 2nd edition, 2004

  60. [60]

    Inference to the best explanation

    Peter Lipton. Inference to the best explanation. In Stathis Psillos and Martin Curd (eds.), The Routledge Companion to Philosophy of Science, pp.\ 193--202. Routledge, Abingdon, 2008

  61. [61]

    An incomplete loop: Instruction inference, instruction following, and in-context learning in language models

    Emmy Liu, Graham Neubig, and Jacob Andreas. An incomplete loop: Instruction inference, instruction following, and in-context learning in language models. In Conference on Language Modeling, 2024. URL https://openreview.net/forum?id=nUNbjMDBWC

  62. [62]

    Evaluating the logical reasoning abilities of large reasoning models, 2025

    Hanmeng Liu, Yiran Ding, Zhizhang Fu, Chaoli Zhang, Xiaozhang Liu, and Yue Zhang. Evaluating the logical reasoning abilities of large reasoning models, 2025

  63. [63]

    The magic of IF : Investigating causal reasoning abilities in large language models of code

    Xiao Liu, Da Yin, Chen Zhang, Yansong Feng, and Dongyan Zhao. The magic of IF : Investigating causal reasoning abilities in large language models of code. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (eds.), Findings of the Association for Computational Linguistics: ACL 2023, pp.\ 9009--9022, Toronto, Canada, July 2023. Association for Computati...

  64. [64]

    Llm discussion: Enhancing the creativity of large language models via discussion framework and role-play

    Li-Chun Lu, Shou-Jen Chen, Tsung-Min Pai, Chan-Hung Yu, Hung yi Lee, and Shao-Hua Sun. Llm discussion: Enhancing the creativity of large language models via discussion framework and role-play. In Conference on Language Modeling, 2024. URL https://openreview.net/forum?id=ybaK4asBT2

  65. [65]

    Towards logiglue: A brief survey and a benchmark for analyzing logical reasoning capabilities of language models

    Man Luo, Shrinidhi Kumbhar, Ming Shen, Mihir Parmar, Neeraj Varshney, Pratyay Banerjee, Somak Aditya, and Chitta Baral. Towards logiglue: A brief survey and a benchmark for analyzing logical reasoning capabilities of language models. arXiv preprint arXiv:2310.00836, 2023. doi:10.48550/arXiv.2310.00836. URL https://arxiv.org/abs/2310.00836

  66. [66]

    Bill MacCartney and Christopher D. Manning. Natural logic for textual inference. In Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, pp.\ 193--200, Prague, 2007. Association for Computational Linguistics. URL https://aclanthology.org/W07-1431/

  67. [67]

    Toward mechanistic explanation of deductive reasoning in language models

    Davide Maltoni and Matteo Ferrara. Toward mechanistic explanation of deductive reasoning in language models. arXiv preprint arXiv:2510.09340, 2025

  68. [68]

    ER-Reason: A Benchmark Dataset for LLM Clinical Reasoning in the Emergency Room

    Nikita Mehandru, Niloufar Golchini, David Bamman, Travis Zack, Melanie F. Molina, and Ahmed Alaa. Er-reason: A benchmark dataset for llm-based clinical reasoning in the emergency room, 2025. URL https://arxiv.org/abs/2505.22919

  69. [69]

    Locating and editing factual associations in gpt

    Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. Locating and editing factual associations in gpt. In Advances in Neural Information Processing Systems, volume 35, 2022. ROME

  70. [70]

    A System of Logic

    John Stuart Mill. A System of Logic. Harper & brothers, New York, 1858

  71. [71]

    Peirce-suit of truth - why inference to the best explanation and abduction ought not to be confused

    Gerhard Minnameier. Peirce-suit of truth - why inference to the best explanation and abduction ought not to be confused. Erkenntnis, 60: 0 75--105, 2004

  72. [72]

    Dixitworld: Evaluating multimodal abductive reasoning in vision-language models with multi-agent dixit gameplay, 2025

    Yunxiang Mo, Tianshi Zheng, Qing Zong, Jiayu Liu, Baixuan Xu, Yauwai Yim, Chunkit Chan, Jiaxin Bai, and Yangqiu Song. Dixitworld: Evaluating multimodal abductive reasoning in vision-language models with multi-agent dixit gameplay, 2025. URL https://arxiv.org/abs/2510.10117

  73. [73]

    Ha Thanh Nguyen, Randy Goebel, Francesca Toni, Kostas Stathis, and Ken Satoh. How well do sota legal reasoning models support abductive reasoning? In Proceedings of the International Conference on Logic Programming 2023 Workshops, volume 3437 of CEUR Workshop Proceedings, London, United Kingdom, 2023. URL https://ceur-ws.org/Vol-3437/paper1LPLR.pdf. Logic...

  74. [74]

    In-context Learning and Induction Heads

    Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Scott Johnston, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, a...

  75. [75]

    From we to me: Theory informed narrative shift with abductive reasoning

    Jaikrishna Manojkumar Patil, Divyagna Bavikadi, Kaustuv Mukherji, Ashby Steward-Nolan, Peggy-Jean Allin, Tumininu Awonuga, Joshua Garland, and Paulo Shakarian. From we to me: Theory informed narrative shift with abductive reasoning. arXiv preprint arXiv:2603.03320, 2026. doi:10.48550/arXiv.2603.03320. URL https://arxiv.org/abs/2603.03320

  76. [76]

    Social commonsense reasoning with multi-head knowledge attention

    Debjit Paul and Anette Frank. Social commonsense reasoning with multi-head knowledge attention. In Findings of the Association for Computational Linguistics: EMNLP 2020, pp.\ 2969--2980, Online, 2020. Association for Computational Linguistics. doi:10.18653/v1/2020.findings-emnlp.267. URL https://aclanthology.org/2020.findings-emnlp.267/

  77. [77]

    Approaches to abductive reasoning: An overview

    Gabriele Paul. Approaches to abductive reasoning: An overview. Artificial Intelligence Review, 7: 0 109--152, 1993. doi:10.1007/BF00849080. URL https://doi.org/10.1007/BF00849080

  78. [78]

    Collected Papers of Charles Sanders Peirce

    Charles Sanders Peirce. Collected Papers of Charles Sanders Peirce. Harvard University Press, Cambridge, MA, 1931--1958. Volumes 1--6 edited by C. Hartshorne and P. Weiss (1931--1935); Volumes 7--8 edited by A.W. Burks (1958)

  79. [79]

    Abduction as deductive saturation: A proof-theoretic inquiry

    Mario Piazza, Gabriele Pulcini, and Andrea Sabatini. Abduction as deductive saturation: A proof-theoretic inquiry. Journal of Philosophical Logic, 52 0 (6): 0 1575--1602, 2023. doi:10.1007/s10992-023-09718-3

  80. [80]

    Doing experiments and revising rules with natural language and probabilistic reasoning

    Wasu Top Piriyakulkij, Cassidy Langenfeld, Tuan Anh Le, and Kevin Ellis. Doing experiments and revising rules with natural language and probabilistic reasoning. In Advances in Neural Information Processing Systems, 2024. URL https://openreview.net/forum?id=HXdAfK488A. Poster

Showing first 80 references.