Misleading tool feedback produces value inversion in LLM agents, with performance dropping below matched no-feedback baselines on HotpotQA and similar tasks.
LLM s cannot find reasoning errors, but can correct them given the error location
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.AI 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Answer Engineering uses local trajectory editing during autoregressive generation to raise protocol compliance on a clinical SSNHL benchmark from 25.1% to 83.5% and balanced accuracy from 42.0% to 80.7%.
citing papers explorer
-
Don't Blindly Trust It: How Unreliable Feedback Breaks Tool-Using LLM Agents
Misleading tool feedback produces value inversion in LLM agents, with performance dropping below matched no-feedback baselines on HotpotQA and similar tasks.
-
Answer Engineering: Local Trajectory Editing for Protocol-Constrained Decision Making in Large Language Models
Answer Engineering uses local trajectory editing during autoregressive generation to raise protocol compliance on a clinical SSNHL benchmark from 25.1% to 83.5% and balanced accuracy from 42.0% to 80.7%.