Maximizing alignment with minimal feedback: Efficiently learning rewards for visuomotor robot policy alignment

Tian, R · 2024 · arXiv 2412.04835

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail

cs.RO · 2025-10-30 · conditional · novelty 6.0

Alpamayo-R1 introduces a VLA model with a Chain of Causation dataset and multi-stage SFT-plus-RL training that reports 12% better planning accuracy and 35% fewer close encounters versus trajectory-only baselines in driving tasks.

Position: Good Embodied Reward Models Need Bad Behavior Data

cs.RO · 2026-05-31 · unverdicted · novelty 4.0

Embodied reward models systematically over-reward unsafe, suboptimal, and shortcut robot behaviors due to training on successful data only, and modest inclusion of bad behavior data improves alignment with human preferences.

Efficient Preference Poisoning Attack on Offline RLHF

cs.LG · 2026-05-04

citing papers explorer

Showing 3 of 3 citing papers.

Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail cs.RO · 2025-10-30 · conditional · none · ref 88
Alpamayo-R1 introduces a VLA model with a Chain of Causation dataset and multi-stage SFT-plus-RL training that reports 12% better planning accuracy and 35% fewer close encounters versus trajectory-only baselines in driving tasks.
Position: Good Embodied Reward Models Need Bad Behavior Data cs.RO · 2026-05-31 · unverdicted · none · ref 27
Embodied reward models systematically over-reward unsafe, suboptimal, and shortcut robot behaviors due to training on successful data only, and modest inclusion of bad behavior data improves alignment with human preferences.
Efficient Preference Poisoning Attack on Offline RLHF cs.LG · 2026-05-04 · unreviewed · ref 105

Maximizing alignment with minimal feedback: Efficiently learning rewards for visuomotor robot policy alignment

fields

years

verdicts

representative citing papers

citing papers explorer