A Mechanistic Analysis of Sim-and-Real Co-Training in Generative Robot Policies

· 2026 · cs.RO · arXiv 2604.13645

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

Co-training, which combines limited in-domain real-world data with abundant surrogate data such as simulation or cross-embodiment robot data, is widely used for training generative robot policies. Despite its empirical success, the mechanisms that determine when and why co-training is effective remain poorly understood. We investigate the mechanism of sim-and-real co-training through theoretical analysis and empirical study, and identify two intrinsic effects governing performance. The first, \textbf{``structured representation alignment"}, reflects a balance between cross-domain representation alignment and domain discernibility, and plays a primary role in downstream performance. The second, the \textbf{``importance reweighting effect"}, arises from domain-dependent modulation of action weighting and operates at a secondary level. We validate these effects with controlled experiments on a toy model and extensive sim-and-sim and sim-and-real robot manipulation experiments. Our analysis offers a unified interpretation of recent co-training techniques and motivates a simple method that consistently improves upon prior approaches. More broadly, our aim is to examine the inner workings of co-training and to facilitate research in this direction.

representative citing papers

FADA: Few-Shot Domain Adaptation via Dynamics Alignment for Humanoid Control

cs.RO · 2026-06-26 · unverdicted · novelty 6.0

FADA is a three-stage Planner-IDM method that achieves few-shot domain adaptation for humanoid control by distilling an oracle policy then finetuning only the IDM on short target-domain rollouts via supervised learning.

LaST-HD: Learning Latent Physical Reasoning from Scalable Human Data for Robot Manipulation

cs.RO · 2026-06-22 · unverdicted · novelty 5.0

LaST-HD creates a shared latent dynamics space via a world model to transfer physical reasoning from scalable human-hand demonstrations to robots, achieving over 90% accuracy with 20 minutes of new data after mixed training.

TacCoRL: Integrating Tactile Feedback into VLA via Simulation

cs.RO · 2026-06-10 · unverdicted · novelty 5.0

TacCoRL integrates tactile feedback into VLA policies via real-aligned simulation co-training and RL, raising average success from 50% to 72.5% on four bimanual contact-rich tasks with direct real-robot transfer.

citing papers explorer

Showing 3 of 3 citing papers after filters.

FADA: Few-Shot Domain Adaptation via Dynamics Alignment for Humanoid Control cs.RO · 2026-06-26 · unverdicted · none · ref 11 · internal anchor
FADA is a three-stage Planner-IDM method that achieves few-shot domain adaptation for humanoid control by distilling an oracle policy then finetuning only the IDM on short target-domain rollouts via supervised learning.
LaST-HD: Learning Latent Physical Reasoning from Scalable Human Data for Robot Manipulation cs.RO · 2026-06-22 · unverdicted · none · ref 95 · internal anchor
LaST-HD creates a shared latent dynamics space via a world model to transfer physical reasoning from scalable human-hand demonstrations to robots, achieving over 90% accuracy with 20 minutes of new data after mixed training.
TacCoRL: Integrating Tactile Feedback into VLA via Simulation cs.RO · 2026-06-10 · unverdicted · none · ref 24 · internal anchor
TacCoRL integrates tactile feedback into VLA policies via real-aligned simulation co-training and RL, raising average success from 50% to 72.5% on four bimanual contact-rich tasks with direct real-robot transfer.

A Mechanistic Analysis of Sim-and-Real Co-Training in Generative Robot Policies

fields

years

verdicts

representative citing papers

citing papers explorer