DeformGen uses dynamics-based state expansion via localized disturbances and deformation-field warping for trajectory transfer to improve policy learning on deformable manipulation benchmarks.
hub Baseline reference
Interndata-a1: Pioneering high-fidelity synthetic data for pre-training generalist policy
Baseline reference. 60% of citing Pith papers use this work as a benchmark or comparison.
hub tools
citation-role summary
citation-polarity summary
years
2026 20representative citing papers
AutoScale is a closed-loop data engine using Graph-RAE for scene representation and Cluster-GA for importance-based retrieval to improve real-synthetic co-training for autonomous driving.
Introduces H-Tac human tactile-action dataset and TTP pre-training that unifies spaces and predicts future tactile signals to improve robotic dexterous manipulation transfer.
ABot-M0.5 proposes a unified mobility-and-manipulation world action model using three alignment strategies that achieves state-of-the-art performance on mobile and fine-grained manipulation benchmarks.
SimFoundry automates zero-shot real-to-sim scene generation from video, producing digital twins and cousins that enable policy training with 0.911 mean Pearson correlation to real-world results and 17-40% success gains from variations.
RepWAM introduces representation visual-action tokenizers to pretrain world action models that jointly model future visual states and latent actions under instructions for improved robot manipulation.
AffordanceVLA proposes a VLA model with affordance-aware modules (Which2Act, Where2Act, How2Act) in a Mixture-of-Transformer trained in three stages to improve robotic manipulation.
OSCAR finetunes Cosmos-Predict2.5-2B on a deduplicated multi-embodiment robotics dataset with kinematic skeleton conditioning, claiming better action following and significant correlation between virtual and real robot policy evaluations.
STABLE generates simulation-ready tabletop scenes by alternating a semantic LLM reasoner for task-aligned coarse layouts with a physics corrector for physical plausibility using progressive scene expansion.
Embody4D generates novel-view videos from monocular robot videos via a 3D-aware synthesis pipeline, confidence-aware expert modulation, and interaction-aware attention for embodied 4D world modeling.
X-WAM unifies robotic action execution and 4D world synthesis by adapting video diffusion priors with a lightweight depth branch and asynchronous noise sampling, achieving 79-91% success on robot benchmarks.
SIM1 converts sparse real demonstrations into high-fidelity synthetic data through physics-aligned simulation, yielding policies that match real-data performance at a 1:15 ratio with 90% zero-shot success on deformable manipulation.
ExpertGen generates high-success expert policies in simulation from imperfect priors by freezing a diffusion behavior model and optimizing its initial noise via RL, then distills them for real-robot deployment.
LabVLA uses RoboGenesis simulation data and a two-stage FAST pretraining plus flow matching recipe on a Qwen3-VL backbone to achieve the highest success rates on LabUtopia under in- and out-of-distribution conditions.
HumanoidMimicGen automatically generates large loco-manipulation datasets from few source demonstrations using whole-body planning, enabling visuomotor policies that outperform real-data-only training by 20% on a new nine-task benchmark.
HyperSim reports 80% and 95% sim-to-real success on two manipulation policies across 400 real executions by combining synthetic environment synthesis, adversarial trajectories, and co-training.
CoEnv introduces a compositional environment that integrates real and simulated spaces for multi-agent robotic collaboration, using real-to-sim reconstruction, VLM action synthesis, and validated sim-to-real transfer to achieve high success rates on multi-arm manipulation tasks.
LingBot-VA combines video world modeling with policy learning via Mixture-of-Transformers, closed-loop rollouts, and asynchronous inference to improve robot manipulation in simulation and real settings.
Authors perform a cross-simulator, cross-policy empirical study of sim-to-real correlation for VLA policies and distill guidance on using simulation for policy improvement.
JoyAI-RA is a multi-source pretrained VLA model that claims to bridge human-to-robot embodiment gaps via data unification and outperforms prior methods on generalization-heavy robotic tasks.
citing papers explorer
-
Unified 4D World Action Modeling from Video Priors with Asynchronous Denoising
X-WAM unifies robotic action execution and 4D world synthesis by adapting video diffusion priors with a lightweight depth branch and asynchronous noise sampling, achieving 79-91% success on robot benchmarks.
-
CoEnv: Driving Embodied Multi-Agent Collaboration via Compositional Environment
CoEnv introduces a compositional environment that integrates real and simulated spaces for multi-agent robotic collaboration, using real-to-sim reconstruction, VLM action synthesis, and validated sim-to-real transfer to achieve high success rates on multi-arm manipulation tasks.