In Christodoulopoulos, C.; Chakraborty, T.; Rose, C.; and Peng, V ., eds.,Findings of the Association for Computational Linguistics: EMNLP 2025, 12376–12394

Stress-testing the reasoning competence of language models with formal proofs · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

ABD: Default Exception Abduction in Finite First Order Worlds

cs.AI · 2026-02-21 · unverdicted · novelty 7.0

ABD benchmark evaluates LLMs on producing parsimonious first-order exception formulas in three observation regimes using SMT verification, finding high validity but persistent parsimony and generalization gaps.

citing papers explorer

Showing 1 of 1 citing paper.

ABD: Default Exception Abduction in Finite First Order Worlds cs.AI · 2026-02-21 · unverdicted · none · ref 1
ABD benchmark evaluates LLMs on producing parsimonious first-order exception formulas in three observation regimes using SMT verification, finding high validity but persistent parsimony and generalization gaps.

In Christodoulopoulos, C.; Chakraborty, T.; Rose, C.; and Peng, V ., eds.,Findings of the Association for Computational Linguistics: EMNLP 2025, 12376–12394

fields

years

verdicts

representative citing papers

citing papers explorer