Better than your teacher: Llm agents that learn from privileged ai feedback

Sanjiban Choudhury, Paloma Sodhi · arXiv 2410.05434

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

What Drives Interactive Improvement from Feedback?

cs.AI · 2026-06-29 · unverdicted · novelty 7.0

Controlled student-teacher experiments across four benchmarks show interactive gains are driven more by the student's ability to use feedback than by teacher quality, with self-feedback adding little beyond unguided retries.

Autoregressive Diffusion World Models for Off-Policy Evaluation of LLM Agents

cs.LG · 2026-06-04 · unverdicted · novelty 6.0

ADWM learns a latent diffusion world model with per-transition independent denoising and policy-conditioned guidance to enable accurate offline evaluation of LLM agent policies.

citing papers explorer

Showing 2 of 2 citing papers.

What Drives Interactive Improvement from Feedback? cs.AI · 2026-06-29 · unverdicted · none · ref 2
Controlled student-teacher experiments across four benchmarks show interactive gains are driven more by the student's ability to use feedback than by teacher quality, with self-feedback adding little beyond unguided retries.
Autoregressive Diffusion World Models for Off-Policy Evaluation of LLM Agents cs.LG · 2026-06-04 · unverdicted · none · ref 3
ADWM learns a latent diffusion world model with per-transition independent denoising and policy-conditioned guidance to enable accurate offline evaluation of LLM agent policies.

Better than your teacher: Llm agents that learn from privileged ai feedback

fields

years

verdicts

representative citing papers

citing papers explorer