hub

Robocasa365: A large-scale simulation framework for training and benchmarking generalist robots

· 2026 · arXiv 2603.04356

16 Pith papers cite this work. Polarity classification is still indexing.

16 Pith papers citing it

read on arXiv browse 16 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

dataset 2 background 1

citation-polarity summary

use dataset 2 unclear 1

representative citing papers

ENPIRE: Agentic Robot Policy Self-Improvement in the Real World

cs.AI · 2026-06-18 · unverdicted · novelty 7.0

ENPIRE supplies four modules (Environment, Policy Improvement, Rollout, Evolution) that turn real-world robot training into an autonomous optimization loop driven by coding agents.

Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation?

cs.CV · 2026-06-03 · unverdicted · novelty 7.0

Dream.exe evaluates 8 video generation models on 101 manipulation tasks by converting generated videos into executable robot trajectories in a simulator, finding measurable success rates that visual metrics do not predict.

SafeManip: A Property-Driven Benchmark for Temporal Safety Evaluation in Robotic Manipulation

cs.RO · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

SafeManip is a benchmark applying reusable LTLf templates across eight safety categories to evaluate temporal properties in robotic manipulation on VLA policies.

MemCompiler: Compile, Don't Inject -- State-Conditioned Memory for Embodied Agents

cs.RO · 2026-05-08 · unverdicted · novelty 7.0 · 2 refs

MemCompiler reframes memory use as state-conditioned compilation, delivering relevant guidance via text and latent channels to improve embodied agent performance up to 129% and cut latency 60% versus static injection.

ABot-M0.5: Unified Mobility-and-Manipulation World Action Model

cs.CV · 2026-07-01 · unverdicted · novelty 6.0

ABot-M0.5 proposes a unified mobility-and-manipulation world action model using three alignment strategies that achieves state-of-the-art performance on mobile and fine-grained manipulation benchmarks.

Robot Critics that Sweat the Small Stuff

cs.RO · 2026-06-19 · unverdicted · novelty 6.0

Fine-tuning VLMs with pairwise progress supervision from policy rollouts improves fine-grained failure detection and boosts robot manipulation success by 11% real-world and 5.9% in simulation.

Inductive Generalization for Robotic Manipulation

cs.RO · 2026-06-19 · unverdicted · novelty 6.0

The paper introduces an inductive generalization evaluation protocol for manipulation policies and shows that SOTA vision-language-action models fail on progressively harder task variants.

Pose6DAug: Physically Plausible Multi-view Object Swapping for Robot Data Augmentation

cs.RO · 2026-06-18 · unverdicted · novelty 6.0

Pose6DAug performs 3D multi-view object swapping via temporally coherent 6D pose trajectories to augment VLA data, reporting 16.5% relative success improvement on novel objects.

Geometric Action Model for Robot Policy Learning

cs.RO · 2026-06-15 · unverdicted · novelty 6.0

GAM splits a geometric foundation model to enable language-conditioned future geometry prediction and action decoding for robot policies, claiming superior performance on manipulation benchmarks.

How to Instruct Your Robot: Dense Language Annotations Power Robot Policy Learning

cs.RO · 2026-05-16 · unverdicted · novelty 6.0

DeMiAn re-annotates robot and egocentric videos with VLM-generated dense labels across motion, scene, pose, and reasoning aspects, then uses a learned instructor to boost policy success by 5 points on RoboCasa over task-only baselines.

LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories

cs.CL · 2026-06-11 · unverdicted · novelty 5.0

LabVLA uses RoboGenesis simulation data and a two-stage FAST pretraining plus flow matching recipe on a Qwen3-VL backbone to achieve the highest success rates on LabUtopia under in- and out-of-distribution conditions.

Robots Need More than VLA and World Models

cs.RO · 2026-06-04 · unverdicted · novelty 5.0

The paper identifies four missing interfaces (data autolabelling, embodiment retargeting, physics-grounded world models, and video-based reward inference) as the central bottleneck beyond VLA scaling for robot intelligence.

A Practical Recipe Towards Improving Sim-and-Real Correlation for VLA Evaluation

cs.RO · 2026-06-09 · unverdicted · novelty 4.0

Authors perform a cross-simulator, cross-policy empirical study of sim-to-real correlation for VLA policies and distill guidance on using simulation for policy improvement.

RLDX-1 Technical Report

cs.RO · 2026-05-05 · unverdicted · novelty 4.0 · 2 refs

RLDX-1 outperforms frontier VLAs such as π0.5 and GR00T N1.6 on dexterous manipulation benchmarks, reaching 86.8% success on ALLEX humanoid tasks versus around 40% for the baselines.

Intelligent Automation for Embodied Benchmark Construction: Pipelines, Embodiments, Simulators, and Trends

cs.RO · 2026-06-10 · unverdicted · novelty 3.0

Automation in embodied benchmark construction shifts costs from acquisition toward validation, auditability, version control, and long-term governance instead of simply lowering total cost.

Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond

cs.AI · 2026-04-24

citing papers explorer

Showing 11 of 11 citing papers after filters.

SafeManip: A Property-Driven Benchmark for Temporal Safety Evaluation in Robotic Manipulation cs.RO · 2026-05-12 · unverdicted · none · ref 14 · 2 links
SafeManip is a benchmark applying reusable LTLf templates across eight safety categories to evaluate temporal properties in robotic manipulation on VLA policies.
MemCompiler: Compile, Don't Inject -- State-Conditioned Memory for Embodied Agents cs.RO · 2026-05-08 · unverdicted · none · ref 22 · 2 links
MemCompiler reframes memory use as state-conditioned compilation, delivering relevant guidance via text and latent channels to improve embodied agent performance up to 129% and cut latency 60% versus static injection.
Robot Critics that Sweat the Small Stuff cs.RO · 2026-06-19 · unverdicted · none · ref 63
Fine-tuning VLMs with pairwise progress supervision from policy rollouts improves fine-grained failure detection and boosts robot manipulation success by 11% real-world and 5.9% in simulation.
Inductive Generalization for Robotic Manipulation cs.RO · 2026-06-19 · unverdicted · none · ref 58
The paper introduces an inductive generalization evaluation protocol for manipulation policies and shows that SOTA vision-language-action models fail on progressively harder task variants.
Pose6DAug: Physically Plausible Multi-view Object Swapping for Robot Data Augmentation cs.RO · 2026-06-18 · unverdicted · none · ref 23
Pose6DAug performs 3D multi-view object swapping via temporally coherent 6D pose trajectories to augment VLA data, reporting 16.5% relative success improvement on novel objects.
Geometric Action Model for Robot Policy Learning cs.RO · 2026-06-15 · unverdicted · none · ref 49
GAM splits a geometric foundation model to enable language-conditioned future geometry prediction and action decoding for robot policies, claiming superior performance on manipulation benchmarks.
How to Instruct Your Robot: Dense Language Annotations Power Robot Policy Learning cs.RO · 2026-05-16 · unverdicted · none · ref 28
DeMiAn re-annotates robot and egocentric videos with VLM-generated dense labels across motion, scene, pose, and reasoning aspects, then uses a learned instructor to boost policy success by 5 points on RoboCasa over task-only baselines.
Robots Need More than VLA and World Models cs.RO · 2026-06-04 · unverdicted · none · ref 87
The paper identifies four missing interfaces (data autolabelling, embodiment retargeting, physics-grounded world models, and video-based reward inference) as the central bottleneck beyond VLA scaling for robot intelligence.
A Practical Recipe Towards Improving Sim-and-Real Correlation for VLA Evaluation cs.RO · 2026-06-09 · unverdicted · none · ref 31
Authors perform a cross-simulator, cross-policy empirical study of sim-to-real correlation for VLA policies and distill guidance on using simulation for policy improvement.
RLDX-1 Technical Report cs.RO · 2026-05-05 · unverdicted · none · ref 81 · 2 links
RLDX-1 outperforms frontier VLAs such as π0.5 and GR00T N1.6 on dexterous manipulation benchmarks, reaching 86.8% success on ALLEX humanoid tasks versus around 40% for the baselines.
Intelligent Automation for Embodied Benchmark Construction: Pipelines, Embodiments, Simulators, and Trends cs.RO · 2026-06-10 · unverdicted · none · ref 39
Automation in embodied benchmark construction shifts costs from acquisition toward validation, auditability, version control, and long-term governance instead of simply lowering total cost.

Robocasa365: A large-scale simulation framework for training and benchmarking generalist robots

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer