pith. sign in

Maximizing alignment with minimal feedback: Efficiently learning rewards for visuomotor robot policy alignment

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

fields

cs.RO 2 cs.LG 1

years

2026 2 2025 1

representative citing papers

Position: Good Embodied Reward Models Need Bad Behavior Data

cs.RO · 2026-05-31 · unverdicted · novelty 4.0

Embodied reward models systematically over-reward unsafe, suboptimal, and shortcut robot behaviors due to training on successful data only, and modest inclusion of bad behavior data improves alignment with human preferences.

citing papers explorer

Showing 3 of 3 citing papers.