hub Canonical reference

RH20T: A comprehensive robotic dataset for learning diverse skills in one-shot

URLhttps://arxiv · 2023 · arXiv 2307.00595

Canonical reference. 73% of citing Pith papers cite this work as background.

18 Pith papers citing it

Background 73% of classified citations

read on arXiv browse 18 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 6 dataset 4 method 1

citation-polarity summary

background 8 use dataset 2 use method 1

representative citing papers

RoboJailBench: Benchmarking Adversarial Attacks and Defenses in Embodied Robotic Agents

cs.CR · 2026-05-19 · unverdicted · novelty 7.0

RoboJailBench creates a taxonomy-based benchmark, intent-contrast datasets, and evaluation framework for jailbreak attacks and defenses in embodied robotic AI systems.

Beyond Isolation: A Unified Benchmark for General-Purpose Navigation

cs.RO · 2026-05-10 · unverdicted · novelty 7.0

OmniNavBench is a unified benchmark for general-purpose navigation featuring composite multi-skill instructions, support for humanoid, quadrupedal and wheeled robots, and 1779 human teleoperated trajectories across 170 environments.

OmniRobotHome: A Multi-Camera Platform for Real-Time Multiadic Human-Robot Interaction

cs.RO · 2026-04-30 · unverdicted · novelty 7.0

A 48-camera residential platform delivers real-time occlusion-robust 3D perception and coordinated actuation for multi-human multi-robot interaction in a shared home workspace.

Being-H0.7: A Latent World-Action Model from Egocentric Videos

cs.RO · 2026-04-30 · unverdicted · novelty 7.0

Being-H0.7 adds future-aware latent reasoning to direct VLA policies via dual-branch alignment on latent queries, matching world-model benefits at VLA efficiency.

VistaBot: View-Robust Robot Manipulation via Spatiotemporal-Aware View Synthesis

cs.RO · 2026-04-23 · unverdicted · novelty 7.0

VistaBot integrates 4D geometry estimation and spatiotemporal view synthesis into action policies to improve cross-view generalization by 2.6-2.8x on a new VGS metric in simulation and real tasks.

RoboCOIN: An Open-Sourced Bimanual Robotic Data Collection for Integrated Manipulation

cs.RO · 2025-11-21 · accept · novelty 7.0

RoboCOIN is a large multi-embodiment bimanual manipulation dataset with hierarchical annotations and an open processing pipeline that improves model performance across robotic platforms.

3D-VLA: A 3D Vision-Language-Action Generative World Model

cs.CV · 2024-03-14 · unverdicted · novelty 7.0

3D-VLA is a new embodied foundation model that uses a 3D LLM plus aligned diffusion models to generate future images and point clouds for improved reasoning and action planning in 3D environments.

FineVLA: Fine-Grained Instruction Alignment for Steerable Vision-Language-Action Policies

cs.RO · 2026-05-26 · unverdicted · novelty 6.0

FineVLA unifies robot datasets into 47k fine-grained trajectories, adds a VLM annotator and benchmark, and shows that mixing fine-grained and goal-level instructions improves steerable control without hurting task success.

Spacetime Optimal-Transport Attention for Visuo-Haptic Imitation Learning of Contact-Rich Manipulation

cs.RO · 2026-05-19 · unverdicted · novelty 6.0

SO-TA replaces standard attention with optimal-transport alignment across vision, force/torque, and proprioception to improve diffusion-policy performance on real-robot insertion and wiping tasks.

HumanNet: Scaling Human-centric Video Learning to One Million Hours

cs.CV · 2026-05-07 · unverdicted · novelty 6.0

HumanNet is a 1M-hour human-centric video dataset with interaction annotations that enables better vision-language-action model performance than equivalent robot data in a controlled test.

MolmoAct2: Action Reasoning Models for Real-world Deployment

cs.RO · 2026-05-04 · unverdicted · novelty 6.0 · 2 refs

MolmoAct2 is an open VLA model that outperforms baselines like Pi-05 on 7 benchmarks and whose backbone surpasses GPT-5 on 13 embodied-reasoning tasks through new datasets, specialized training, and architecture changes for lower latency.

EgoVerse: An Egocentric Human Dataset for Robot Learning from Around the World

cs.RO · 2026-04-08 · unverdicted · novelty 6.0

EgoVerse releases 1,362 hours of standardized egocentric human data across 1,965 tasks and shows via multi-lab experiments that robot policy performance scales with human data volume when the data aligns with robot objectives.

Robometer: Scaling General-Purpose Robotic Reward Models via Trajectory Comparisons

cs.RO · 2026-03-02 · unverdicted · novelty 6.0

Robometer combines intra-trajectory progress supervision with inter-trajectory preference supervision on a 1M-trajectory dataset to learn more generalizable robotic reward functions than prior methods.

IGen: Scalable Data Generation for Robot Learning from Open-World Images

cs.RO · 2025-12-01 · unverdicted · novelty 6.0

IGen generates realistic visuomotor training data including actions and temporally coherent visuals from unstructured open-world images via 3D reconstruction and VLM reasoning.

ProcVLM: Learning Procedure-Grounded Progress Rewards for Robotic Manipulation

cs.RO · 2026-05-09 · unverdicted · novelty 5.0

ProcVLM learns procedure-grounded dense progress rewards for robotic manipulation via a reasoning-before-estimation VLM trained on a 60M-frame synthesized corpus from 30 embodied datasets.

World Action Models: The Next Frontier in Embodied AI

cs.RO · 2026-05-12 · unverdicted · novelty 4.0

The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.

Embody4D: A Generalist Data Engine for Embodied 4D World Modeling

cs.CV · 2026-05-03

From Video to Control: A Survey of Learning Manipulation Interfaces from Temporal Visual Data

cs.RO · 2026-04-04

citing papers explorer

Showing 1 of 1 citing paper after filters.

MolmoAct2: Action Reasoning Models for Real-world Deployment cs.RO · 2026-05-04 · unverdicted · none · ref 14 · 2 links
MolmoAct2 is an open VLA model that outperforms baselines like Pi-05 on 7 benchmarks and whose backbone surpasses GPT-5 on 13 embodied-reasoning tasks through new datasets, specialized training, and architecture changes for lower latency.

RH20T: A comprehensive robotic dataset for learning diverse skills in one-shot

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer