Latent Policy Barrier: Learning Robust Visuomotor Policies by Staying In-Distribution

Shuran Song; Zhanyi Sun

arxiv: 2508.05941 · v2 · pith:JXWGK2RAnew · submitted 2025-08-08 · 💻 cs.RO

Latent Policy Barrier: Learning Robust Visuomotor Policies by Staying In-Distribution

Zhanyi Sun , Shuran Song This is my paper

classification 💻 cs.RO

keywords expertpolicydatabarrierlatentvisuomotordistributiondynamics

0 comments

read the original abstract

Visuomotor policies trained via behavior cloning are vulnerable to covariate shift, where small deviations from expert trajectories can compound into failure. Common strategies to mitigate this issue involve expanding the training distribution through human-in-the-loop corrections or synthetic data augmentation. However, these approaches are often labor-intensive, rely on strong task assumptions, or compromise the quality of imitation. We introduce Latent Policy Barrier, a framework for robust visuomotor policy learning. Inspired by Control Barrier Functions, LPB treats the latent embeddings of expert demonstrations as an implicit barrier separating safe, in-distribution states from unsafe, out-of-distribution (OOD) ones. Our approach decouples the role of precise expert imitation and OOD recovery into two separate modules: a base diffusion policy solely on expert data, and a dynamics model trained on both expert and suboptimal policy rollout data. At inference time, the dynamics model predicts future latent states and optimizes them to stay within the expert distribution. Both simulated and real-world experiments show that LPB improves both policy robustness and data efficiency, enabling reliable manipulation from limited expert data and without additional human correction or annotation.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

TouchGuide: Inference-Time Steering of Visuomotor Policies via Touch Guidance
cs.RO 2026-01 unverdicted novelty 7.0

TouchGuide improves contact-rich robot manipulation by steering diffusion or flow-matching visuomotor policies with tactile feasibility scores from a contrastively trained Contact Physical Model.
VLA-Corrector: Lightweight Detect-and-Correct Inference for Adaptive Action Horizon
cs.RO 2026-07 unverdicted novelty 6.0

VLA-Corrector adds a detect-and-correct inference layer using a latent vision monitor and online gradient guidance to enable adaptive action horizons in chunked VLA policies.
ReGuide: From Test-Time Guidance to Self-Improving Diffusion Policies
cs.LG 2026-06 unverdicted novelty 6.0

ReGuide is a self-improving framework that uses phase-conditioned guidance to generate corrective rollouts and absorbs successful ones back into diffusion policy training, yielding 1.3-7.7x success gains on Robomimic tasks.
Breaking Lock-In: Preserving Steerability under Low-Data VLA Post-Training
cs.RO 2026-04 unverdicted novelty 6.0

DeLock mitigates lock-in in low-data VLA post-training via visual grounding preservation and test-time contrastive prompt guidance, outperforming baselines across eight evaluations while matching data-heavy generalist...
CMP: Robust Whole-Body Tracking for Loco-Manipulation via Competence Manifold Projection
cs.RO 2026-04 unverdicted novelty 6.0

CMP projects actions onto a learned competence manifold using a frame-wise safety scheme and isomorphic latent space to achieve up to 10x better survival in out-of-distribution scenarios with under 10% tracking loss.