From Action Labels to Sets: Rethinking Action Supervision for Imitation Learning from Corrective Feedback

· 2025 · cs.RO · arXiv 2502.07645

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Behavior cloning (BC) optimizes policies by treating human demonstrations as pointwise action labels. While effective with accurate action labels, this formulation is brittle in practice: when human-provided actions are imperfect, treating each label as an exact target can steer the policy away from the underlying desired behavior, particularly when expressive models are used (e.g., energy-based models). As a result, we propose a human-in-the-loop alternative that replaces pointwise supervision with set-valued action targets. We introduce Contrastive policy Learning from Interactive Corrections (CLIC). CLIC leverages human corrections to construct and refine sets of desired actions, and optimizes a policy to place probability mass over these sets rather than over a single action target. This formulation naturally accommodates both absolute and relative corrections and can represent complex multi-modal behaviors. Extensive simulation and real-robot experiments show that the proposed approach leads to effective policy learning across diverse settings: CLIC remains competitive with the state of the art under accurate data while being substantially more robust under noisy, relative, and partial feedback. Our implementation is publicly available at https://clic-webpage.github.io/.

representative citing papers

Set-Supervised Diffusion Policy: Learning Action-Chunking Diffusion through Corrections

cs.RO · 2026-06-01 · unverdicted · novelty 6.0

SDP constructs sets of desired action-chunks from human correction pairs and trains diffusion policies to align with those sets, yielding better performance and robustness than standard behavior cloning on robotic tasks.

Wavelet Policy: Imitation Learning in the Scale Domain with World Prior Memory

cs.RO · 2025-04-07

citing papers explorer

Showing 2 of 2 citing papers.

Set-Supervised Diffusion Policy: Learning Action-Chunking Diffusion through Corrections cs.RO · 2026-06-01 · unverdicted · none · ref 20 · internal anchor
SDP constructs sets of desired action-chunks from human correction pairs and trains diffusion policies to align with those sets, yielding better performance and robustness than standard behavior cloning on robotic tasks.
Wavelet Policy: Imitation Learning in the Scale Domain with World Prior Memory cs.RO · 2025-04-07 · unreviewed · ref 15 · internal anchor

From Action Labels to Sets: Rethinking Action Supervision for Imitation Learning from Corrective Feedback

fields

years

verdicts

representative citing papers

citing papers explorer