LLM s cannot find reasoning errors, but can correct them given the error location

Gladys Tyen, Hassan Mansoor, Victor Carbune, Peter Chen, Tony Mak · 2024 · DOI 10.18653/v1/2024.findings-acl.826

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

Don't Blindly Trust It: How Unreliable Feedback Breaks Tool-Using LLM Agents

cs.AI · 2026-06-19 · unverdicted · novelty 6.0

Misleading tool feedback produces value inversion in LLM agents, with performance dropping below matched no-feedback baselines on HotpotQA and similar tasks.

Answer Engineering: Local Trajectory Editing for Protocol-Constrained Decision Making in Large Language Models

cs.AI · 2026-06-19 · unverdicted · novelty 6.0

Answer Engineering uses local trajectory editing during autoregressive generation to raise protocol compliance on a clinical SSNHL benchmark from 25.1% to 83.5% and balanced accuracy from 42.0% to 80.7%.

citing papers explorer

Showing 2 of 2 citing papers.

Don't Blindly Trust It: How Unreliable Feedback Breaks Tool-Using LLM Agents cs.AI · 2026-06-19 · unverdicted · none · ref 50
Misleading tool feedback produces value inversion in LLM agents, with performance dropping below matched no-feedback baselines on HotpotQA and similar tasks.
Answer Engineering: Local Trajectory Editing for Protocol-Constrained Decision Making in Large Language Models cs.AI · 2026-06-19 · unverdicted · none · ref 4
Answer Engineering uses local trajectory editing during autoregressive generation to raise protocol compliance on a clinical SSNHL benchmark from 25.1% to 83.5% and balanced accuracy from 42.0% to 80.7%.

LLM s cannot find reasoning errors, but can correct them given the error location

fields

years

verdicts

representative citing papers

citing papers explorer