Diagnosing the First-Order Logical Reasoning Ability Through L ogic NLI

Tian, Jidong, Li, Yitian, Chen, Wenqing, Xiao, Liqiang, He, Hao, Jin, Yaohui · 2021 · DOI 10.18653/v1/2021.emnlp-main.303

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

open at publisher browse 6 citing papers

representative citing papers

HOLMES: Evaluating Higher-Order Logical Reasoning in LLMs

cs.AI · 2026-06-22 · unverdicted · novelty 7.0

HOLMES is the first real-world benchmark for higher-order symbolic reasoning in LLMs, where models average 50.64% accuracy and the best reaches 59.54%.

Fixing FOLIO and MALLS: Verified Annotations and an LLM-assisted Framework to Focus Human Relabeling

cs.CL · 2026-06-01 · unverdicted · novelty 7.0

Audit finds 36-39% incorrect FOL labels in FOLIO and MALLS; corrections raise LLM accuracy 9-22 points and an LLM-guided review framework achieves 90% dataset quality after checking fewer than 24% of examples.

QMFOL: Benchmarking Large Language Model Reasoning via Quantifiable Monadic First-Order Logic Test Case Generation

cs.AI · 2026-06-18 · unverdicted · novelty 6.0

QMFOL generates monadic first-order logic tasks with controllable complexity via pattern-based structures and round-trip prover verification, then evaluates six LRMs showing performance drops as logical depth and width increase.

ChLogic: Evaluating Robustness of Logical Reasoning in Chinese Expressions

cs.CL · 2026-06-16 · unverdicted · novelty 6.0

ChLogic benchmark shows persistent English-Chinese gaps in LLM logical reasoning performance, with back-translation effects varying by model and difficulty.

Scaling with Confidence: Calibrating Confidence of LLMs for Adaptive Test Time Scaling

cs.AI · 2026-07-02 · unverdicted · novelty 5.0

C3RL is a new RL algorithm combining correctness, calibration, and reference accuracy rewards to improve LLM confidence calibration, enabling CAS to outperform majority voting with up to 12.33x lower inference cost.

LGMT: Logic-Grounded Metamorphic Testing for Evaluating the Reasoning Reliability of LLMs

cs.AI · 2026-05-12

citing papers explorer

Showing 6 of 6 citing papers after filters.

HOLMES: Evaluating Higher-Order Logical Reasoning in LLMs cs.AI · 2026-06-22 · unverdicted · none · ref 8
HOLMES is the first real-world benchmark for higher-order symbolic reasoning in LLMs, where models average 50.64% accuracy and the best reaches 59.54%.
Fixing FOLIO and MALLS: Verified Annotations and an LLM-assisted Framework to Focus Human Relabeling cs.CL · 2026-06-01 · unverdicted · none · ref 42
Audit finds 36-39% incorrect FOL labels in FOLIO and MALLS; corrections raise LLM accuracy 9-22 points and an LLM-guided review framework achieves 90% dataset quality after checking fewer than 24% of examples.
QMFOL: Benchmarking Large Language Model Reasoning via Quantifiable Monadic First-Order Logic Test Case Generation cs.AI · 2026-06-18 · unverdicted · none · ref 31
QMFOL generates monadic first-order logic tasks with controllable complexity via pattern-based structures and round-trip prover verification, then evaluates six LRMs showing performance drops as logical depth and width increase.
ChLogic: Evaluating Robustness of Logical Reasoning in Chinese Expressions cs.CL · 2026-06-16 · unverdicted · none · ref 19
ChLogic benchmark shows persistent English-Chinese gaps in LLM logical reasoning performance, with back-translation effects varying by model and difficulty.
Scaling with Confidence: Calibrating Confidence of LLMs for Adaptive Test Time Scaling cs.AI · 2026-07-02 · unverdicted · none · ref 39
C3RL is a new RL algorithm combining correctness, calibration, and reference accuracy rewards to improve LLM confidence calibration, enabling CAS to outperform majority voting with up to 12.33x lower inference cost.
LGMT: Logic-Grounded Metamorphic Testing for Evaluating the Reasoning Reliability of LLMs cs.AI · 2026-05-12 · unreviewed · ref 47

Diagnosing the First-Order Logical Reasoning Ability Through L ogic NLI

fields

years

verdicts

representative citing papers

citing papers explorer