arxiv: 2604.08016 · v2 · submitted 2026-04-09 · 💻 cs.AI · cs.LG

Recognition: no theorem link

Wiring the 'Why': A Unified Taxonomy and Survey of Abductive Reasoning in LLMs

Moein Salimi , Shaygan Adim , Danial Parnian , Nima Alighardashi , Mahdi Jafari Siavoshani , Mohammad Hossein Rohban

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:53 UTC · model grok-4.3

classification 💻 cs.AI cs.LG

keywords abductive reasoninglarge language modelssurveytaxonomyhypothesis generationhypothesis selectionbenchmarkingreasoning capabilities

0 comments

The pith

A unified two-stage framework organizes research on abductive reasoning in large language models into hypothesis generation and selection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper offers the first survey of how large language models handle abductive reasoning, the process of inferring the best explanation for an observation. It identifies confusion in the field due to varying definitions and tasks, and introduces a clear split: one stage where the model creates possible explanations to fill knowledge gaps, and another where it picks the best one from those options. This split allows the authors to build a taxonomy covering tasks, datasets, methods, and how performance is measured. They also test several LLMs on these tasks to see patterns across different model types and compare abduction to other forms of reasoning like deduction and induction. The work ends by noting shortcomings in current research that limit progress.

Core claim

This paper presents the first survey of abductive reasoning in LLMs, tracing its trajectory from philosophical foundations to contemporary AI implementations. To address the widespread conceptual confusion and disjointed task definitions prevalent in the field, we establish a unified two-stage definition that formally categorizes prior work. This definition disentangles abduction into Hypothesis Generation, where models bridge epistemic gaps to produce candidate explanations, and Hypothesis Selection, where the generated candidates are evaluated and the most plausible explanation is chosen. Building upon this foundation, we present a comprehensive taxonomy of the literature, categorizing the

What carries the argument

The unified two-stage definition of abductive reasoning, separating Hypothesis Generation from Hypothesis Selection, which structures the taxonomy and benchmark analysis.

If this is right

Previous studies on abductive reasoning in LLMs can be systematically categorized by tasks, datasets, methodologies, and evaluation strategies.
LLMs demonstrate distinct performance patterns when generating candidate explanations versus selecting the most plausible one.
Abductive reasoning performance relates to deductive and inductive reasoning capabilities, offering broader insights into model reasoning.
Critical gaps exist in static benchmark designs, narrow domain coverage, limited training frameworks, and insufficient mechanistic understanding.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Training methods could be designed to target hypothesis generation and selection as separate skills to enhance overall abductive performance.
Links between different reasoning types may support development of AI systems capable of multiple inference modes.
Addressing the noted gaps would require creating benchmarks with wider domains and more dynamic designs.
Techniques for understanding model internals could be focused on how explanations are formed during abductive tasks.

Load-bearing premise

The proposed two-stage split into Hypothesis Generation and Hypothesis Selection accurately and exhaustively organizes all prior abductive work without omitting important variants or forcing artificial boundaries on existing task definitions.

What would settle it

A significant collection of abductive reasoning research that cannot be classified into either the hypothesis generation stage or the hypothesis selection stage would falsify the completeness of the unified definition.

Figures

Figures reproduced from arXiv: 2604.08016 by Danial Parnian, Mahdi Jafari Siavoshani, Moein Salimi, Mohammad Hossein Rohban, Nima Alighardashi, Shaygan Adim.

**Figure 1.** Figure 1: Publication Trends of Abductive Reasoning Research in Computer Science. The stacked bar chart categorizes publications into LLM-based, NLP-based (Non-LLM), and other computer science approaches. each case, the crucial step is not merely applying known rules, but guessing what might be going on behind the scenes: inventing a plausible story that, if true, would make the puzzling observation unsurprising. Th… view at source ↗

**Figure 2.** Figure 2: The Abductive Reasoning Pipeline. As defined in Section 2.2.1, we conceptualize abduction as a two-stage process: (1) Hypothesis Generation, where an observation (O) triggers the retrieval or creation of a candidate set H to bridge an epistemic gap; and (2) Hypothesis Selection, where candidates are evaluated to identify the best explanation h ∗ . This survey unites works that focus on either stage individ… view at source ↗

**Figure 3.** Figure 3: Categorization of literature on abductive reasoning in LLMs. The taxonomy is divided into four [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗

**Figure 4.** Figure 4: Macro-average performance on the Stage I generation benchmarks. This figure summarizes overall [PITH_FULL_IMAGE:figures/full_fig_p022_4.png] view at source ↗

**Figure 5.** Figure 5: Macro-average performance on the Stage II selection benchmarks. This figure summarizes overall [PITH_FULL_IMAGE:figures/full_fig_p023_5.png] view at source ↗

**Figure 6.** Figure 6: Scaling trends within model families on the Stage I generation and Stage II selection benchmarks. [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗

**Figure 8.** Figure 8: Abductive accuracy plotted against deductive (blue) and inductive (green) accuracy. Each point represents a shared (model, method) configuration evaluated on an abductive dataset and a corresponding deductive or inductive dataset. 5 Identified Gaps in Current Research 5.1 Lack of Unified Definitions One of the core challenges in this area is the lack of a shared understanding of what abductive reasoning… view at source ↗

read the original abstract

Regardless of its foundational role in human discovery and sense-making, abductive reasoning--the inference of the most plausible explanation for an observation--has been relatively underexplored in Large Language Models (LLMs). Despite the rapid advancement of LLMs, the exploration of abductive reasoning and its diverse facets has thus far been disjointed rather than cohesive. This paper presents the first survey of abductive reasoning in LLMs, tracing its trajectory from philosophical foundations to contemporary AI implementations. To address the widespread conceptual confusion and disjointed task definitions prevalent in the field, we establish a unified two-stage definition that formally categorizes prior work. This definition disentangles abduction into Hypothesis Generation, where models bridge epistemic gaps to produce candidate explanations, and Hypothesis Selection, where the generated candidates are evaluated and the most plausible explanation is chosen. Building upon this foundation, we present a comprehensive taxonomy of the literature, categorizing prior work based on their abductive tasks, datasets, underlying methodologies, and evaluation strategies. In order to ground our framework empirically, we conduct a compact benchmark study of current LLMs on abductive tasks, together with targeted comparative analyses across model sizes, model families, evaluation styles, and the distinct generation-versus-selection task typologies. Moreover, by synthesizing recent empirical results, we examine how LLM performance on abductive reasoning relates to deductive and inductive tasks, providing insights into their broader reasoning capabilities. Our analysis reveals critical gaps in current approaches--from static benchmark design and narrow domain coverage to narrow training frameworks and limited mechanistic understanding of abductive processes...

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a useful first survey that gives abductive reasoning in LLMs a clean two-stage taxonomy, though the benchmark stays too light to carry much weight.

read the letter

This paper is the first survey of abductive reasoning in LLMs. It introduces a unified two-stage definition that breaks abduction into hypothesis generation, where models create candidate explanations, and hypothesis selection, where they pick the best one. This framing tackles the conceptual confusion and scattered task definitions in the area. The authors trace the topic from philosophy to current LLM work and build a taxonomy that groups studies by tasks, datasets, methodologies, and evaluation approaches. They back this with a compact benchmark that compares models across sizes, families, and the generation versus selection split, plus some analysis linking abductive performance to deductive and inductive reasoning. The synthesis highlights gaps such as narrow domain coverage and limited understanding of the underlying processes. This is useful because it gives the field a shared language and points out where progress is needed. The taxonomy construction is logical and draws on the cited literature without obvious omissions in the abstract. The soft spot is the empirical component. The benchmark lacks detailed methods, dataset lists, or statistical controls in the summary, which makes it difficult to verify the strength of the comparative findings. If the full paper expands on these, it would help. Readers who study LLM reasoning or design explanation tasks will get value from the overview and framework. The paper shows clear thinking in organizing the literature, so it deserves a serious referee. I would recommend sending it to peer review, with the expectation that the benchmark details get filled in during revision.

Referee Report

0 major / 3 minor

Summary. The manuscript claims to provide the first comprehensive survey of abductive reasoning in large language models (LLMs). It traces the concept from its philosophical origins to modern AI applications, introduces a unified two-stage definition consisting of Hypothesis Generation and Hypothesis Selection to resolve conceptual confusion and disjointed task definitions, develops a taxonomy organizing the literature by abductive tasks, datasets, methodologies, and evaluation strategies, performs a compact benchmark study comparing LLMs on these tasks with analyses across model sizes, families, and task types, and synthesizes results to relate abductive reasoning to deductive and inductive capabilities while identifying gaps in benchmarks, domains, training, and mechanistic understanding.

Significance. If the proposed taxonomy and two-stage definition prove to be comprehensive and accurate, this work will serve as a foundational reference for standardizing research on abductive reasoning in LLMs. The empirical benchmark, despite being compact, offers valuable comparative insights, and the examination of relations to other reasoning types contributes to understanding broader LLM reasoning. The identification of critical gaps provides clear directions for future work. The synthesis of external literature is a strength, though the empirical component requires more detail for full impact.

minor comments (3)

[Benchmark Study] The compact benchmark study lacks detailed methods, specific dataset lists, or statistical controls, limiting verifiability of the comparative analyses across model sizes, families, and generation-versus-selection typologies.
A summary table mapping surveyed works to the two-stage taxonomy categories would improve readability and allow readers to quickly assess coverage.
[Abstract] The abstract describes 'targeted comparative analyses' and 'synthesizing recent empirical results' without naming the models, metrics, or key quantitative findings, reducing standalone clarity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive and constructive review, which recognizes the paper's role as a foundational survey and the value of the unified taxonomy, benchmark, and gap analysis. We appreciate the recommendation for minor revision and address the single point raised regarding the empirical component below.

read point-by-point responses

Referee: The empirical component requires more detail for full impact.

Authors: We agree that expanding the description of the compact benchmark would strengthen the manuscript. In the revision, we will add: (1) explicit details on task selection criteria and prompt templates used for generation vs. selection stages; (2) full per-model performance tables with standard deviations across runs; (3) a brief error analysis categorizing failure modes by hypothesis quality; and (4) justification for the benchmark's scope relative to the taxonomy. These additions will be placed in an expanded Section 5 without altering the compact nature of the study. revision: yes

Circularity Check

0 steps flagged

No significant circularity in survey and taxonomy synthesis

full rationale

This paper is a literature survey that proposes a two-stage taxonomy (Hypothesis Generation followed by Hypothesis Selection) to organize existing abductive reasoning work in LLMs. No equations, fitted parameters, predictions, or derivations appear in the provided text or abstract. The central claims rest on synthesis and categorization of external prior literature rather than any internal reduction to the paper's own inputs or self-citations. The two-stage split is presented as an organizing framework, not as a result derived from data or prior self-work within the manuscript. This is the expected non-circular outcome for a survey paper whose contributions are classificatory rather than predictive or deductive.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a survey and taxonomy paper that synthesizes existing literature; no free parameters are fitted, no new axioms are postulated, and no invented entities are introduced beyond the descriptive two-stage framework.

pith-pipeline@v0.9.0 · 5608 in / 1207 out tokens · 35471 ms · 2026-05-10T17:53:36.313197+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

115 extracted references · 76 canonical work pages · 3 internal anchors

[1]

Aakur and Sudeep Sarkar

Sathyanarayanan N. Aakur and Sudeep Sarkar. Leveraging symbolic knowledge bases for commonsense natural language inference using pattern theory. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45 0 (11): 0 13185--13202, 2023. doi:10.1109/TPAMI.2023.3287837

work page doi:10.1109/tpami.2023.3287837 2023
[2]

Abductive Reasoning: Logical Investigations into Discovery and Explanation, volume 330 of Synthese Library

Atocha Aliseda. Abductive Reasoning: Logical Investigations into Discovery and Explanation, volume 330 of Synthese Library. Springer Dordrecht, 2006. ISBN 978-1-4020-3907-2. doi:10.1007/1-4020-3907-7

work page doi:10.1007/1-4020-3907-7 2006
[3]

Alkan, Shashwat Sourav, Maja Jabłońska, Simone Astarita, Rishabh Chakrabarty, N

A. Alkan, Shashwat Sourav, Maja Jabłońska, Simone Astarita, Rishabh Chakrabarty, N. Garuda, P. Khetarpal, Maciej Pi'oro, Dimitrios Tanoglidis, Kartheik G. Iyer, M. Polimera, Michael J. Smith, Tirthankar Ghosal, M. Huertas-Company, Sandor Kruk, Kevin Schawinski, and Ioana Ciucua. A survey on hypothesis generation for scientific discovery in the era of larg...

2025
[4]

Advancing abductive reasoning in knowledge graphs through complex logical hypothesis generation

Jiaxin Bai, Yicheng Wang, Tianshi Zheng, Yue Guo, Xin Liu, and Yangqiu Song. Advancing abductive reasoning in knowledge graphs through complex logical hypothesis generation. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.\ 1312--1329, Bangkok, Thailand, 2024. Association for Computati...

work page doi:10.18653/v1/2024.acl-long.72 2024
[5]

Steering large language model activations in sparse spaces

Reza Bayat, Ali Rahimi-Kalahroudi, Mohammad Pezeshki, Sarath Chandar, and Pascal Vincent. Steering large language model activations in sparse spaces. In Proceedings of the Conference on Language Modeling (COLM), 2025

2025
[6]

On relationships between induction and abduction: A logical point of view

Brigitte Bessant. On relationships between induction and abduction: A logical point of view. In Peter A. Flach and Antonis C. Kakas (eds.), Abduction and Induction: Essays on their Relation and Integration, volume 18 of Applied Logic Series, pp.\ 77--87. Kluwer Academic Publishers, Dordrecht, 2000. doi:10.1007/978-94-017-0606-3_5

work page doi:10.1007/978-94-017-0606-3_5 2000
[7]

Abductive commonsense reasoning

Chandra Bhagavatula, Ronan Le Bras, Chaitanya Malaviya, Keisuke Sakaguchi, Ari Holtzman, Hannah Rashkin, Doug Downey, Wen-tau Yih, and Yejin Choi. Abductive commonsense reasoning. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=Byg1v1HKDB

2020
[8]

and Angeli, Gabor and Potts, Christopher and Manning, Christopher D

Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp.\ 632--642, Lisbon, Portugal, 2015. Association for Computational Linguistics. doi:10.18653/v1/D15-1075. URL https://a...

work page doi:10.18653/v1/d15-1075 2015
[9]

Markus J. Buehler. In situ graph reasoning and knowledge expansion using graph-preflexor. Advanced Intelligent Discovery, 1 0 (3): 0 e202500006, 2025. doi:10.1002/aidi.202500006. URL https://doi.org/10.1002/aidi.202500006

work page doi:10.1002/aidi.202500006 2025
[10]

e-snli: Natural language inference with natural language explanations

Oana-Maria Camburu, Tim Rockt \"a schel, Thomas Lukasiewicz, and Phil Blunsom. e-snli: Natural language inference with natural language explanations. In Advances in Neural Information Processing Systems, volume 31, 2018. URL https://papers.nips.cc/paper/8163-e-snli-natural-language-inference-with-natural-language-explanations

2018
[11]

Daniel G. Campos. On the distinction between P eirce's abduction and L ipton's inference to the best explanation. Synthese, 180: 0 419--442, 2011. doi:10.1007/s11229-009-9709-3. URL https://doi.org/10.1007/s11229-009-9709-3

work page doi:10.1007/s11229-009-9709-3 2011
[12]

Self-consistent narrative prompts on abductive natural language inference

Chunkit Chan, Xin Liu, Tsz Ho Chan, Jiayang Cheng, Yangqiu Song, Ginny Wong, and Simon See. Self-consistent narrative prompts on abductive natural language inference. In Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (...

work page doi:10.18653/v1/2023.ijcnlp-main.67 2023
[13]

Abductivemllm: Boosting visual abductive reasoning within mllms

Boyu Chang, Qi Wang, Xi Guo, Zhixiong Nan, Yazhou Yao, and Tianfei Zhou. Abductivemllm: Boosting visual abductive reasoning within mllms. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 40, pp.\ 2698--2706, 2026. doi:10.1609/aaai.v40i4.37258. URL https://ojs.aaai.org/index.php/AAAI/article/view/37258

work page doi:10.1609/aaai.v40i4.37258 2026
[14]

Beyond brainstorming: What drives high-quality scientific ideas? lessons from multi-agent collaboration

Nuo Chen, Yicheng Tong, Jiaying Wu, Minh Duc Duong, Qian Wang, Qingyun Zou, Bryan Hooi, and Bingsheng He. Beyond brainstorming: What drives high-quality scientific ideas? lessons from multi-agent collaboration. arXiv preprint arXiv:2508.04575, 2025. doi:10.48550/arXiv.2508.04575. URL https://arxiv.org/abs/2508.04575

work page doi:10.48550/arxiv.2508.04575 2025
[15]

What’s in the image? a deep-dive into the vision of vision language models

Aditya Chinchure, Sahithya Ravi, Raymond Ng, Vered Shwartz, Boyang Li, and Leonid Sigal. Black swan: Abductive and defeasible video reasoning in unpredictable events. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 24201--24210, 2025. doi:10.1109/CVPR52734.2025.02254. URL https://blackswan-video.github.io/

work page doi:10.1109/cvpr52734.2025.02254 2025
[16]

On the Measure of Intelligence

François Chollet. On the measure of intelligence, 2019. URL https://arxiv.org/abs/1911.01547

work page internal anchor Pith review arXiv 2019
[17]

Transformers as soft reasoners over language

Peter Clark, Oyvind Tafjord, and Kyle Richardson. Transformers as soft reasoners over language. In Christian Bessiere (ed.), Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20 , pp.\ 3882--3890. International Joint Conferences on Artificial Intelligence Organization, 7 2020. doi:10.24963/ijcai.2020/537. URL...

work page doi:10.24963/ijcai.2020/537 2020
[18]

Mavor-Parker, Aengus Lynch, Stefan Heimersheim, and Adri \`a Garriga-Alonso

Arthur Conmy, Augustine N. Mavor-Parker, Aengus Lynch, Stefan Heimersheim, and Adri \`a Garriga-Alonso. Towards automated circuit discovery for mechanistic interpretability. In Advances in Neural Information Processing Systems, volume 36, 2023

2023
[19]

Inference to the best explanation in large language models

Dhairya Dalal, Marco Valentino, Andre Freitas, and Paul Buitelaar. Inference to the best explanation in large language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.\ 217--235, Bangkok, Thailand, 2024. Association for Computational Linguistics. doi:10.18653/v1/2024.acl-long.1...

work page doi:10.18653/v1/2024.acl-long.14 2024
[20]

True detective: A deep abductive reasoning benchmark undoable for GPT -3 and challenging for GPT -4

Maksym Del and Mark Fishel. True detective: A deep abductive reasoning benchmark undoable for GPT -3 and challenging for GPT -4. In Proceedings of the 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023), pp.\ 314--322, Toronto, Canada, 2023. Association for Computational Linguistics. doi:10.18653/v1/2023.starsem-1.28. URL https://acla...

work page doi:10.18653/v1/2023.starsem-1.28 2023
[21]

Abductive Reasoning in Science

Finnur Dells \'e n. Abductive Reasoning in Science. Elements in Philosophy of Science. Cambridge University Press, June 2024. ISBN 9781009500524. doi:10.1017/9781009353199

work page doi:10.1017/9781009353199 2024
[22]

Delrieux

C. Delrieux. Abductive inference in defeasible reasoning: a model for research programmes. Journal of Applied Logic, 2 0 (4): 0 409--437, 2004. doi:10.1016/j.jal.2004.07.003

work page doi:10.1016/j.jal.2004.07.003 2004
[23]

Assessing the reasoning capabilities of LLM s in the context of evidence-based claim verification

John Dougrez-Lewis, Mahmud Elahi Akhter, Federico Ruggeri, Sebastian L \"o bbers, Yulan He, and Maria Liakata. Assessing the reasoning capabilities of LLM s in the context of evidence-based claim verification. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar (eds.), Findings of the Association for Computational Linguistics: A...

work page doi:10.18653/v1/2025.findings-acl.1059 2025
[24]

Abduction

Igor Douven. Abduction . In Edward N. Zalta and Uri Nodelman (eds.), The Stanford Encyclopedia of Philosophy . Metaphysics Research Lab, Stanford University, W inter 2025 edition, 2025

2025
[25]

Pan, Sylvia Wang, Kunxun Qi, Yuming Shen, and Yu Deng

Jianfeng Du, Jeff Z. Pan, Sylvia Wang, Kunxun Qi, Yuming Shen, and Yu Deng. Validation of growing knowledge graphs by abductive text evidences. Proceedings of the AAAI Conference on Artificial Intelligence, 33 0 (01): 0 2784--2791, Jul. 2019. doi:10.1609/aaai.v33i01.33012784. URL https://ojs.aaai.org/index.php/AAAI/article/view/4130

work page doi:10.1609/aaai.v33i01.33012784 2019
[26]

e- CARE : a new dataset for exploring explainable causal reasoning

Li Du, Xiao Ding, Kai Xiong, Ting Liu, and Bing Qin. e- CARE : a new dataset for exploring explainable causal reasoning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.\ 432--446, Dublin, Ireland, 2022. Association for Computational Linguistics. doi:10.18653/v1/2022.acl-long.33. URL h...

work page doi:10.18653/v1/2022.acl-long.33 2022
[27]

Hwang, Maxwell Forbes, and Yejin Choi

Denis Emelin, Ronan Le Bras, Jena D. Hwang, Maxwell Forbes, and Yejin Choi. Moral stories: Situated reasoning about norms, intents, actions, and their consequences. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp.\ 698--718, Onli...

work page doi:10.18653/v1/2021.emnlp-main.54 2021
[28]

Flach and Antonis C

Peter A. Flach and Antonis C. Kakas. Abductive and inductive reasoning: Background and issues. In Peter A. Flach and Antonis C. Kakas (eds.), Abduction and Induction: Essays on their Relation and Integration, volume 18 of Applied Logic Series, pp.\ 1--27. Kluwer Academic Publishers, Dordrecht, 2000. doi:10.1007/978-94-017-0606-3_1

work page doi:10.1007/978-94-017-0606-3_1 2000
[29]

Frankfurt

Harry G. Frankfurt. Peirce's notion of abduction. The Journal of Philosophy, 55 0 (14): 0 593--597, 1958. doi:10.2307/2021966. URL https://doi.org/10.2307/2021966

work page doi:10.2307/2021966 1958
[30]

Patterson, Matthew Churpek, Tim- othy Miller, Dmitriy Dligach, and Majid Afshar

Yanjun Gao, Ruizhe Li, Emma Croxford, John Caskey, Brian W Patterson, Matthew Churpek, Timothy Miller, Dmitriy Dligach, and Majid Afshar. Leveraging medical knowledge graphs into large language models for diagnosis prediction: Design and application study. JMIR AI, 4: 0 e58670, February 2025. ISSN 2817-1705. doi:10.2196/58670. URL http://dx.doi.org/10.2196/58670

work page doi:10.2196/58670 2025
[31]

Unifying deductive and abductive reasoning in knowledge graphs with masked diffusion model

Yisen Gao, Jiaxin Bai, Yi Huang, Xingcheng Fu, Qingyun Sun, and Yangqiu Song. Unifying deductive and abductive reasoning in knowledge graphs with masked diffusion model. In Proceedings of the ACM Web Conference 2026 (WWW '26), Dubai, United Arab Emirates, 2026 a . Association for Computing Machinery. doi:10.1145/3774904.3792133. URL https://doi.org/10.114...

work page doi:10.1145/3774904.3792133 2026
[32]

Controllable logical hypothesis generation for abductive reasoning in knowledge graphs

Yisen Gao, Jiaxin Bai, Tianshi Zheng, Ziwei Zhang, Qingyun Sun, Xingcheng Fu, Jianxin Li, and Yangqiu Song. Controllable logical hypothesis generation for abductive reasoning in knowledge graphs. In International Conference on Learning Representations, 2026 b . URL https://openreview.net/forum?id=oTgJg0M9kY. Poster

2026
[33]

The third PASCAL recognizing textual entailment challenge

Danilo Giampiccolo, Bernardo Magnini, Ido Dagan, and Bill Dolan. The third PASCAL recognizing textual entailment challenge. In Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, pp.\ 1--9, Prague, 2007. Association for Computational Linguistics. URL https://aclanthology.org/W07-1401/

2007
[34]

Available: http://dx.doi.org/10.1038/s41586-025-09422-z

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. Nature, 2025. doi:10.1038/s41586-025-09422-z. URL https://www.nature.com/articles/s41586-025-09422-z

work page doi:10.1038/s41586-025-09422-z 2025
[35]

Whodunit: Evaluation benchmark for culprit detection in mystery stories, 2025

Kshitij Gupta. Whodunit: Evaluation benchmark for culprit detection in mystery stories, 2025. URL https://arxiv.org/abs/2502.07747

work page arXiv 2025
[36]

Patterns of Discovery: An Inquiry into the Conceptual Foundations of Science

Norwood Russell Hanson. Patterns of Discovery: An Inquiry into the Conceptual Foundations of Science. Cambridge University Press, Cambridge, 1958

1958
[37]

The inference to the best explanation

Gilbert Harman. The inference to the best explanation. Philosophical Review, 74 0 (1): 0 88--95, 1965. doi:10.2307/2183532

work page doi:10.2307/2183532 1965
[38]

and Lu, F

Jinwei He and Feng Lu. Causejudger: Identifying the cause with llms for abductive logical reasoning, 2024. URL https://arxiv.org/abs/2409.05559

work page arXiv 2024
[39]

From reasoning to learning: A survey on hypothesis discovery and rule learning with large language models

Kaiyu He and Zhiyu Chen. From reasoning to learning: A survey on hypothesis discovery and rule learning with large language models. Transactions on Machine Learning Research, 2025. URL https://openreview.net/forum?id=d7W38UzUg0

2025
[40]

Gear: A general evaluation framework for abductive reasoning, 2025 a

Kaiyu He, Peilin Wu, Mian Zhang, Kun Wan, Wentian Zhao, Xinya Du, and Zhiyu Chen. Gear: A general evaluation framework for abductive reasoning, 2025 a . URL https://arxiv.org/abs/2509.24096

work page arXiv 2025
[41]

Idea: Enhancing the rule learning ability of large language model agents through induction, deduction, and abduction

Kaiyu He, Mian Zhang, Shuo Yan, Peilin Wu, and Zhiyu Chen. Idea: Enhancing the rule learning ability of large language model agents through induction, deduction, and abduction. In Findings of the Association for Computational Linguistics: ACL 2025, pp.\ 13563--13597, Vienna, Austria, 2025 b . Association for Computational Linguistics. doi:10.18653/v1/2025...

work page doi:10.18653/v1/2025.findings-acl.698 2025
[42]

LEGO : A multi-agent collaborative framework with role-playing and iterative feedback for causality explanation generation

Zhitao He, Pengfei Cao, Yubo Chen, Kang Liu, Ruopeng Li, Mengshu Sun, and Jun Zhao. LEGO : A multi-agent collaborative framework with role-playing and iterative feedback for causality explanation generation. In Findings of the Association for Computational Linguistics: EMNLP 2023, pp.\ 9142--9163, Singapore, 2023. Association for Computational Linguistics...

work page doi:10.18653/v1/2023.findings-emnlp.613 2023
[43]

Hwang, Jae Sung Park, Rowan Zellers, Chandra Bhagavatula, Anna Rohrbach, Kate Saenko, and Yejin Choi

Jack Hessel, Jena D. Hwang, Jae Sung Park, Rowan Zellers, Chandra Bhagavatula, Anna Rohrbach, Kate Saenko, and Yejin Choi. The abduction of sherlock holmes: A dataset for visual abductive reasoning. In Computer Vision -- ECCV 2022, volume 13696 of Lecture Notes in Computer Science, pp.\ 558--575. Springer, Cham, 2022. doi:10.1007/978-3-031-20059-5_32. URL...

work page doi:10.1007/978-3-031-20059-5_32 2022
[44]

A implies b: Circuit analysis in llms for propositional logical reasoning

Guan Zhe Hong, Nishanth Dikkala, Enming Luo, Cyrus Rashtchian, Xin Wang, and Rina Panigrahy. A implies b: Circuit analysis in llms for propositional logical reasoning. In Advances in Neural Information Processing Systems, 2025

2025
[45]

Argmed-agents: Explainable clinical decision reasoning with large language models via argumentation schemes

Shengxin Hong, Liang Xiao, Xin Zhang, and Jianxia Chen. Argmed-agents: Explainable clinical decision reasoning with large language models via argumentation schemes. In 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp.\ 1989--1996. IEEE, 2024. doi:10.1109/BIBM62325.2024.10822109. Also available as arXiv:2403.06294

work page doi:10.1109/bibm62325.2024.10822109 2024
[46]

Disentangling logic: The role of context in large language models' formal reasoning capabilities

Wenyue Hua, Kaijie Zhu, Lingyao Li, Lizhou Fan, Mingyu Jin, Shuhang Lin, Haochen Xue, Zelong Li, Jindong Wang, and Yongfeng Zhang. Disentangling logic: The role of context in large language models' formal reasoning capabilities. In Findings of the Association for Computational Linguistics: ACL 2025, pp.\ 19219--19242, Vienna, Austria, 2025. Association fo...

work page doi:10.18653/v1/2025.findings-acl.983 2025
[47]

The relation of P eirce's abduction to inference to the best explanation

Yi Jiang. The relation of P eirce's abduction to inference to the best explanation. Chinese Semiotic Studies, 20 0 (3): 0 485--496, 2024. doi:10.1515/css-2024-2022

work page doi:10.1515/css-2024-2022 2024
[48]

Kakas and Loizos Michael

A. Kakas and Loizos Michael. Abduction and argumentation for explainable machine learning: A position survey, 2020

2020
[49]

Peirce and the autonomy of abductive reasoning

Tomis Kapitan. Peirce and the autonomy of abductive reasoning. Erkenntnis, 37: 0 1--26, 1992

1992
[50]

Minsu Kim and James Thorne. Epistemology of language models: Do language models have holistic knowledge? In Lun-Wei Ku, Andre Martins, and Vivek Srikumar (eds.), Findings of the Association for Computational Linguistics: ACL 2024, pp.\ 12644--12669, Bangkok, Thailand, August 2024. Association for Computational Linguistics. doi:10.18653/v1/2024.findings-ac...

work page doi:10.18653/v1/2024.findings-acl.751 2024
[51]

Playgrounds for abstraction and reasoning

Subin Kim, Prin Phunyaphibarn, Donghyun Ahn, and Sundong Kim. Playgrounds for abstraction and reasoning. In NeurIPS 2022 Workshop on Neuro Causal and Symbolic AI (nCSI), 2022. URL https://openreview.net/forum?id=F4RNpByoqP

2022
[52]

arXiv preprint arXiv:2403.00745 , year=

J \'a nos Kram \'a r, Tom Lieberum, Rohin Shah, and Neel Nanda. Atp*: An efficient and scalable method for localizing llm behaviour to components. arXiv preprint arXiv:2403.00745, 2024

work page arXiv 2024
[53]

Multi-modal action chain abductive reasoning (mar)

Mengze Li, Tianbao Wang, Jiahe Xu, Kairong Han, Shengyu Zhang, Zhou Zhao, Jiaxu Miao, Wenqiao Zhang, Shiliang Pu, and Fei Wu. Multi-modal action chain abductive reasoning (mar). In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023), pp.\ 4617--4628, Toronto, Canada, 2023. Association for Computational Lingui...

work page doi:10.18653/v1/2023.acl-long.254 2023
[54]

From hypothesis to premises: Llm-based backward logical reasoning with selective symbolic translation

Qingchuan Li, Mingyue Cheng, Zirui Liu, Daoyu Wang, Yuting Zeng, and Tongxuan Liu. From hypothesis to premises: Llm-based backward logical reasoning with selective symbolic translation. Proceedings of the AAAI Conference on Artificial Intelligence, 40 0 (37): 0 31671--31679, 2026. doi:10.1609/aaai.v40i37.40434. URL https://ojs.aaai.org/index.php/AAAI/arti...

work page doi:10.1609/aaai.v40i37.40434 2026
[55]

MViTv2: Improved Multiscale Vision Transformers for Classification and Detection , isbn =

Chen Liang, Wenguan Wang, Tianfei Zhou, and Yi Yang. Visual abductive reasoning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 15544--15554, 2022. doi:10.1109/CVPR52688.2022.01512. URL https://openaccess.thecvf.com/content/CVPR2022/html/Liang_Visual_Abductive_Reasoning_CVPR_2022_paper.html

work page doi:10.1109/cvpr52688.2022.01512 2022
[56]

URL https://doi.org/10

Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Wang, Yan Wang, Rui Wang, Yujiu Yang, Shuming Shi, and Zhaopeng Tu. Encouraging divergent thinking in large language models through multi-agent debate. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pp.\ 17889--17904, Miami, Florida, USA, 2024. Association for Computati...

work page doi:10.18653/v1/2024.emnlp-main.992 2024
[57]

Let's verify step by step

Hunter Lightman, Vineet Kosaraju, Yuri Burda, Harrison Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. Let's verify step by step. In International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=v8L0pN6EOi. Poster

2024
[58]

Abductive inference in retrieval-augmented language models: Generating and validating missing premises, 2025

Shiyin Lin. Abductive inference in retrieval-augmented language models: Generating and validating missing premises, 2025. URL https://arxiv.org/abs/2511.04020

work page arXiv 2025
[59]

Inference to the Best Explanation

Peter Lipton. Inference to the Best Explanation. Routledge, London, 2nd edition, 2004

2004
[60]

Inference to the best explanation

Peter Lipton. Inference to the best explanation. In Stathis Psillos and Martin Curd (eds.), The Routledge Companion to Philosophy of Science, pp.\ 193--202. Routledge, Abingdon, 2008

2008
[61]

An incomplete loop: Instruction inference, instruction following, and in-context learning in language models

Emmy Liu, Graham Neubig, and Jacob Andreas. An incomplete loop: Instruction inference, instruction following, and in-context learning in language models. In Conference on Language Modeling, 2024. URL https://openreview.net/forum?id=nUNbjMDBWC

2024
[62]

Evaluating the logical reasoning abilities of large reasoning models, 2025

Hanmeng Liu, Yiran Ding, Zhizhang Fu, Chaoli Zhang, Xiaozhang Liu, and Yue Zhang. Evaluating the logical reasoning abilities of large reasoning models, 2025

2025
[63]

The magic of IF : Investigating causal reasoning abilities in large language models of code

Xiao Liu, Da Yin, Chen Zhang, Yansong Feng, and Dongyan Zhao. The magic of IF : Investigating causal reasoning abilities in large language models of code. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (eds.), Findings of the Association for Computational Linguistics: ACL 2023, pp.\ 9009--9022, Toronto, Canada, July 2023. Association for Computati...

work page doi:10.18653/v1/2023.findings-acl.574 2023
[64]

Llm discussion: Enhancing the creativity of large language models via discussion framework and role-play

Li-Chun Lu, Shou-Jen Chen, Tsung-Min Pai, Chan-Hung Yu, Hung yi Lee, and Shao-Hua Sun. Llm discussion: Enhancing the creativity of large language models via discussion framework and role-play. In Conference on Language Modeling, 2024. URL https://openreview.net/forum?id=ybaK4asBT2

2024
[65]

Towards logiglue: A brief survey and a benchmark for analyzing logical reasoning capabilities of language models

Man Luo, Shrinidhi Kumbhar, Ming Shen, Mihir Parmar, Neeraj Varshney, Pratyay Banerjee, Somak Aditya, and Chitta Baral. Towards logiglue: A brief survey and a benchmark for analyzing logical reasoning capabilities of language models. arXiv preprint arXiv:2310.00836, 2023. doi:10.48550/arXiv.2310.00836. URL https://arxiv.org/abs/2310.00836

work page doi:10.48550/arxiv.2310.00836 2023
[66]

Bill MacCartney and Christopher D. Manning. Natural logic for textual inference. In Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, pp.\ 193--200, Prague, 2007. Association for Computational Linguistics. URL https://aclanthology.org/W07-1431/

2007
[67]

Toward mechanistic explanation of deductive reasoning in language models

Davide Maltoni and Matteo Ferrara. Toward mechanistic explanation of deductive reasoning in language models. arXiv preprint arXiv:2510.09340, 2025

work page arXiv 2025
[68]

ER-Reason: A Benchmark Dataset for LLM Clinical Reasoning in the Emergency Room

Nikita Mehandru, Niloufar Golchini, David Bamman, Travis Zack, Melanie F. Molina, and Ahmed Alaa. Er-reason: A benchmark dataset for llm-based clinical reasoning in the emergency room, 2025. URL https://arxiv.org/abs/2505.22919

work page internal anchor Pith review arXiv 2025
[69]

Locating and editing factual associations in gpt

Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. Locating and editing factual associations in gpt. In Advances in Neural Information Processing Systems, volume 35, 2022. ROME

2022
[70]

A System of Logic

John Stuart Mill. A System of Logic. Harper & brothers, New York, 1858
[71]

Peirce-suit of truth - why inference to the best explanation and abduction ought not to be confused

Gerhard Minnameier. Peirce-suit of truth - why inference to the best explanation and abduction ought not to be confused. Erkenntnis, 60: 0 75--105, 2004

2004
[72]

Dixitworld: Evaluating multimodal abductive reasoning in vision-language models with multi-agent dixit gameplay, 2025

Yunxiang Mo, Tianshi Zheng, Qing Zong, Jiayu Liu, Baixuan Xu, Yauwai Yim, Chunkit Chan, Jiaxin Bai, and Yangqiu Song. Dixitworld: Evaluating multimodal abductive reasoning in vision-language models with multi-agent dixit gameplay, 2025. URL https://arxiv.org/abs/2510.10117

work page arXiv 2025
[73]

Ha Thanh Nguyen, Randy Goebel, Francesca Toni, Kostas Stathis, and Ken Satoh. How well do sota legal reasoning models support abductive reasoning? In Proceedings of the International Conference on Logic Programming 2023 Workshops, volume 3437 of CEUR Workshop Proceedings, London, United Kingdom, 2023. URL https://ceur-ws.org/Vol-3437/paper1LPLR.pdf. Logic...

2023
[74]

In-context Learning and Induction Heads

Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Scott Johnston, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, a...

work page internal anchor Pith review arXiv 2022
[75]

From we to me: Theory informed narrative shift with abductive reasoning

Jaikrishna Manojkumar Patil, Divyagna Bavikadi, Kaustuv Mukherji, Ashby Steward-Nolan, Peggy-Jean Allin, Tumininu Awonuga, Joshua Garland, and Paulo Shakarian. From we to me: Theory informed narrative shift with abductive reasoning. arXiv preprint arXiv:2603.03320, 2026. doi:10.48550/arXiv.2603.03320. URL https://arxiv.org/abs/2603.03320

work page doi:10.48550/arxiv.2603.03320 2026
[76]

Social commonsense reasoning with multi-head knowledge attention

Debjit Paul and Anette Frank. Social commonsense reasoning with multi-head knowledge attention. In Findings of the Association for Computational Linguistics: EMNLP 2020, pp.\ 2969--2980, Online, 2020. Association for Computational Linguistics. doi:10.18653/v1/2020.findings-emnlp.267. URL https://aclanthology.org/2020.findings-emnlp.267/

work page doi:10.18653/v1/2020.findings-emnlp.267 2020
[77]

Approaches to abductive reasoning: An overview

Gabriele Paul. Approaches to abductive reasoning: An overview. Artificial Intelligence Review, 7: 0 109--152, 1993. doi:10.1007/BF00849080. URL https://doi.org/10.1007/BF00849080

work page doi:10.1007/bf00849080 1993
[78]

Collected Papers of Charles Sanders Peirce

Charles Sanders Peirce. Collected Papers of Charles Sanders Peirce. Harvard University Press, Cambridge, MA, 1931--1958. Volumes 1--6 edited by C. Hartshorne and P. Weiss (1931--1935); Volumes 7--8 edited by A.W. Burks (1958)

1931
[79]

Abduction as deductive saturation: A proof-theoretic inquiry

Mario Piazza, Gabriele Pulcini, and Andrea Sabatini. Abduction as deductive saturation: A proof-theoretic inquiry. Journal of Philosophical Logic, 52 0 (6): 0 1575--1602, 2023. doi:10.1007/s10992-023-09718-3

work page doi:10.1007/s10992-023-09718-3 2023
[80]

Doing experiments and revising rules with natural language and probabilistic reasoning

Wasu Top Piriyakulkij, Cassidy Langenfeld, Tuan Anh Le, and Kevin Ellis. Doing experiments and revising rules with natural language and probabilistic reasoning. In Advances in Neural Information Processing Systems, 2024. URL https://openreview.net/forum?id=HXdAfK488A. Poster

2024

Showing first 80 references.