pith. machine review for the scientific record. sign in

arxiv: 1406.5979 · v1 · submitted 2014-06-23 · 💻 cs.LG · stat.ML

Recognition: unknown

Reinforcement and Imitation Learning via Interactive No-Regret Learning

J. Andrew Bagnell, Stephane Ross

Authors on Pith no claims yet
classification 💻 cs.LG stat.ML
keywords learningimitationapproachinteractivereinforcementcostexistingextend
0
0 comments X
read the original abstract

Recent work has demonstrated that problems-- particularly imitation learning and structured prediction-- where a learner's predictions influence the input-distribution it is tested on can be naturally addressed by an interactive approach and analyzed using no-regret online learning. These approaches to imitation learning, however, neither require nor benefit from information about the cost of actions. We extend existing results in two directions: first, we develop an interactive imitation learning approach that leverages cost information; second, we extend the technique to address reinforcement learning. The results provide theoretical support to the commonly observed successes of online approximate policy iteration. Our approach suggests a broad new family of algorithms and provides a unifying view of existing techniques for imitation and reinforcement learning.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Revisiting DAgger in the Era of LLM-Agents

    cs.LG 2026-05 conditional novelty 6.0

    DAgger-style training with turn-level policy interpolation raises 4B and 8B LLM agents to 27.3% and 29.8% on SWE-bench Verified, beating several larger published systems.

  2. Provable imitation learning for control of instability in partially-observed Vlasov--Poisson equations

    cs.LG 2026-05 unverdicted novelty 6.0

    Imitation learning yields provably stable feedback policies for partially observed Vlasov-Poisson instabilities, with error floors determined by minimal behavior-cloning loss characterized via entropy of the initial d...

  3. BridgeSim: Unveiling the OL-CL Gap in End-to-End Autonomous Driving

    cs.RO 2026-04 unverdicted novelty 6.0

    The primary OL-CL gap in end-to-end autonomous driving arises from objective mismatch creating structural inability to model reactive behaviors, which a test-time adaptation method can mitigate.

  4. The Essence of Balance for Self-Improving Agents in Vision-and-Language Navigation

    cs.CV 2026-04 unverdicted novelty 5.0

    SDB balances behavioral diversity and learning stability in VLN self-improvement by expanding decisions into latent hypotheses, performing reliability-aware aggregation, and applying a regularizer, yielding gains such...

  5. Jump-Start Reinforcement Learning with Vision-Language-Action Regularization

    cs.LG 2026-04 unverdicted novelty 5.0

    VLAJS augments PPO with sparse annealed VLA guidance through directional regularization to cut required interactions by over 50% on manipulation tasks and enable zero-shot sim-to-real transfer.