Controlled student-teacher experiments across four benchmarks show interactive gains are driven more by the student's ability to use feedback than by teacher quality, with self-feedback adding little beyond unguided retries.
Better than your teacher: Llm agents that learn from privileged ai feedback
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
ADWM learns a latent diffusion world model with per-transition independent denoising and policy-conditioned guidance to enable accurate offline evaluation of LLM agent policies.
citing papers explorer
-
What Drives Interactive Improvement from Feedback?
Controlled student-teacher experiments across four benchmarks show interactive gains are driven more by the student's ability to use feedback than by teacher quality, with self-feedback adding little beyond unguided retries.