RubricRefine is a training-free pre-execution method that creates rubrics to score and fix inter-tool contract violations in agent code, reaching 0.86 average on M3ToolEval across seven models with zero executions and lower latency.
When can LLMs actually correct their own mistakes? A critical survey of self-correction of LLMs
6 Pith papers cite this work. Polarity classification is still indexing.
years
2026 6representative citing papers
LORIS detects local reasoning errors in LLM-generated proofs for loop invariants by translating natural-language steps to first-order logic implications and using invalid implications to refine the invariants, achieving 93.1% success on 460 C programs.
Weighted rules extend stable model semantics to support probabilistic reasoning, model ranking, and statistical inference in answer set programs.
ReflectCAP distills model-specific hallucination and oversight patterns into Structured Reflection Notes that steer LVLMs toward more factual and complete image captions, reaching the Pareto frontier on factuality-coverage trade-offs.
A two-rate measurement (correction c and corruption γ) for LLM protocol steps predicts accuracy changes from paired correctness bits and flags three failure modes including mixture shift on GSM8K.
citing papers explorer
-
RubricRefine: Improving Tool-Use Agent Reliability with Training-Free Pre-Execution Refinement
RubricRefine is a training-free pre-execution method that creates rubrics to score and fix inter-tool contract violations in agent code, reaching 0.86 average on M3ToolEval across seven models with zero executions and lower latency.
-
Guiding LLM-based Loop Invariant Synthesis via Feedback on Local Reasoning Errors
LORIS detects local reasoning errors in LLM-generated proofs for loop invariants by translating natural-language steps to first-order logic implications and using invalid implications to refine the invariants, achieving 93.1% success on 460 C programs.
-
Weighted Rules under the Stable Model Semantics
Weighted rules extend stable model semantics to support probabilistic reasoning, model ranking, and statistical inference in answer set programs.
-
ReflectCAP: Detailed Image Captioning with Reflective Memory
ReflectCAP distills model-specific hallucination and oversight patterns into Structured Reflection Notes that steer LVLMs toward more factual and complete image captions, reaching the Pareto frontier on factuality-coverage trade-offs.
-
Correction and Corruption: A Two-Rate View of Error Flow in LLM Protocols
A two-rate measurement (correction c and corruption γ) for LLM protocol steps predicts accuracy changes from paired correctness bits and flags three failure modes including mixture shift on GSM8K.
- sciwrite-lint: Verification Infrastructure for the Age of Science Vibe-Writing