hub

3d diffusion policy: Generalizable visuomotor policy learning via simple 3d representations

Yanjie Ze, Gu Zhang, Kangning Zhang, Chenyuan Hu, Muhan Wang, Huazhe Xu · 2024 · arXiv 2403.03954

20 Pith papers cite this work. Polarity classification is still indexing.

20 Pith papers citing it

read on arXiv browse 20 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

representative citing papers

Test-time Sparsity for Extreme Fast Action Diffusion

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

Test-time sparsity with a parallel pipeline and omnidirectional feature reuse accelerates action diffusion by 5x to 47.5 Hz while cutting FLOPs 92% with no performance loss.

VistaBot: View-Robust Robot Manipulation via Spatiotemporal-Aware View Synthesis

cs.RO · 2026-04-23 · unverdicted · novelty 7.0

VistaBot integrates 4D geometry estimation and spatiotemporal view synthesis into action policies to improve cross-view generalization by 2.6-2.8x on a new VGS metric in simulation and real tasks.

BiCoord: A Bimanual Manipulation Benchmark towards Long-Horizon Spatial-Temporal Coordination

cs.RO · 2026-04-07 · conditional · novelty 7.0

BiCoord is a new benchmark for long-horizon tightly coordinated bimanual manipulation that includes quantitative metrics and shows existing policies like DP, RDT, Pi0 and OpenVLA-OFT struggle on such tasks.

Diffusion Policy with Bayesian Expert Selection for Active Multi-Target Tracking

cs.RO · 2026-04-03 · unverdicted · novelty 7.0

A Bayesian expert selection framework with variational Bayesian last layers and lower confidence bounds improves diffusion policies for active multi-target tracking.

StereoPolicy: Improving Robotic Manipulation Policies via Stereo Perception

cs.RO · 2026-05-11 · unverdicted · novelty 6.0

StereoPolicy fuses stereo image pairs via a Stereo Transformer on pretrained 2D encoders to boost robotic manipulation policies, showing gains over monocular, RGB-D, point cloud, and multi-view methods in simulations and real-robot tests.

From Pixels to Tokens: A Systematic Study of Latent Action Supervision for Vision-Language-Action Models

cs.RO · 2026-05-06 · unverdicted · novelty 6.0

A unified comparison of latent action supervision strategies for VLA models reveals task-specific benefits, with image-based approaches aiding reasoning and generalization, action-based aiding motor control, and discrete tokens proving most effective.

FingerViP: Learning Real-World Dexterous Manipulation with Fingertip Visual Perception

cs.RO · 2026-04-23 · conditional · novelty 6.0

FingerViP equips each finger with a miniature camera and trains a multi-view diffusion policy that achieves 80.8% success on real-world dexterous tasks previously limited by wrist-camera occlusion.

ShapeGen: Robotic Data Generation for Category-Level Manipulation

cs.RO · 2026-04-16 · unverdicted · novelty 6.0

ShapeGen generates shape-diverse 3D robotic manipulation demonstrations without simulators by curating a functional shape library and applying a minimal-annotation pipeline for novel, physically plausible data.

Robotic Manipulation is Vision-to-Geometry Mapping ($f(v) \rightarrow G$): Vision-Geometry Backbones over Language and Video Models

cs.RO · 2026-04-14 · unverdicted · novelty 6.0

Vision-geometry backbones using pretrained 3D world models outperform vision-language and video models for robotic manipulation by enabling direct mapping from visual input to geometric actions.

A1: A Fully Transparent Open-Source, Adaptive and Efficient Truncated Vision-Language-Action Model

cs.RO · 2026-04-07 · unverdicted · novelty 6.0

A1 is a transparent VLA framework achieving state-of-the-art robot manipulation success with up to 72% lower latency via adaptive layer truncation and inter-layer flow matching.

Multi-View Video Diffusion Policy: A 3D Spatio-Temporal-Aware Video Action Model

cs.RO · 2026-04-03 · conditional · novelty 6.0

MV-VDP jointly predicts multi-view RGB and heatmap videos via diffusion to achieve data-efficient, robust robotic manipulation policies.

AttenA+: Rectifying Action Inequality in Robotic Foundation Models

cs.RO · 2026-05-13 · unverdicted · novelty 5.0

AttenA+ applies velocity-driven action attention to reweight training objectives toward kinematically critical low-velocity segments, yielding small benchmark gains on Libero and RoboTwin without added parameters.

X-Imitator: Spatial-Aware Imitation Learning via Bidirectional Action-Pose Interaction

cs.RO · 2026-05-12 · unverdicted · novelty 5.0

X-Imitator is a bidirectional action-pose interaction framework for spatial-aware imitation learning that outperforms vanilla policies and explicit pose guidance on 24 simulated and 3 real-world robotic tasks.

TAIL-Safe: Task-Agnostic Safety Monitoring for Imitation Learning Policies

cs.RO · 2026-05-02 · unverdicted · novelty 5.0 · 2 refs

TAIL-Safe learns a Lipschitz Q-function from visibility, recognizability, and graspability criteria in a Gaussian Splatting twin to define an empirical safe set for IL policies and recovers unsafe actions via Nagumo-inspired gradient ascent.

StableIDM: Stabilizing Inverse Dynamics Model against Manipulator Truncation via Spatio-Temporal Refinement

cs.RO · 2026-04-20 · unverdicted · novelty 5.0

StableIDM stabilizes inverse dynamics models under manipulator truncation by combining robot-centric masking, directional spatial feature aggregation, and temporal dynamics refinement, yielding 12.1% higher strict action accuracy on AgiBot and 9.7-17.6% gains in real-robot tasks.

R3D: Revisiting 3D Policy Learning

cs.CV · 2026-04-16 · unverdicted · novelty 5.0

A transformer 3D encoder plus diffusion decoder architecture, with 3D-specific augmentations, outperforms prior 3D policy methods on manipulation benchmarks by improving training stability.

FastGrasp: Learning-based Whole-body Control method for Fast Dexterous Grasping with Mobile Manipulators

cs.RO · 2026-04-14 · unverdicted · novelty 5.0

FastGrasp uses two-stage RL with CVAE for diverse grasp candidates from point clouds and tactile sensing for impact adjustments to achieve robust fast whole-body grasping in sim and real-world settings.

Learning 3D Representations for Spatial Intelligence from Unposed Multi-View Images

cs.CV · 2026-04-12 · unverdicted · novelty 5.0

UniSplat learns consistent 3D geometry, appearance, and semantics from unposed images using dual masking, progressive Gaussian splatting, and recalibration to align predictions across tasks.

Robot Learning from Human Videos: A Survey

cs.RO · 2026-04-30 · unverdicted · novelty 2.0

The survey organizes human-video-based robot learning into task-, observation-, and action-oriented transfer pathways, reviews associated datasets, and outlines challenges for scalable embodied AI.

Redefining End-of-Life: Intelligent Automation for Electronics Remanufacturing Systems

eess.SY · 2026-04-03 · unverdicted · novelty 2.0

A literature review of intelligent automation approaches using robotics, AI, and control for disassembly, inspection, sorting, and reprocessing of end-of-life electronics.

citing papers explorer

Showing 20 of 20 citing papers after filters.

Test-time Sparsity for Extreme Fast Action Diffusion cs.CV · 2026-05-13 · unverdicted · none · ref 29
Test-time sparsity with a parallel pipeline and omnidirectional feature reuse accelerates action diffusion by 5x to 47.5 Hz while cutting FLOPs 92% with no performance loss.
VistaBot: View-Robust Robot Manipulation via Spatiotemporal-Aware View Synthesis cs.RO · 2026-04-23 · unverdicted · none · ref 15
VistaBot integrates 4D geometry estimation and spatiotemporal view synthesis into action policies to improve cross-view generalization by 2.6-2.8x on a new VGS metric in simulation and real tasks.
BiCoord: A Bimanual Manipulation Benchmark towards Long-Horizon Spatial-Temporal Coordination cs.RO · 2026-04-07 · conditional · none · ref 68
BiCoord is a new benchmark for long-horizon tightly coordinated bimanual manipulation that includes quantitative metrics and shows existing policies like DP, RDT, Pi0 and OpenVLA-OFT struggle on such tasks.
Diffusion Policy with Bayesian Expert Selection for Active Multi-Target Tracking cs.RO · 2026-04-03 · unverdicted · none · ref 18
A Bayesian expert selection framework with variational Bayesian last layers and lower confidence bounds improves diffusion policies for active multi-target tracking.
StereoPolicy: Improving Robotic Manipulation Policies via Stereo Perception cs.RO · 2026-05-11 · unverdicted · none · ref 40
StereoPolicy fuses stereo image pairs via a Stereo Transformer on pretrained 2D encoders to boost robotic manipulation policies, showing gains over monocular, RGB-D, point cloud, and multi-view methods in simulations and real-robot tests.
From Pixels to Tokens: A Systematic Study of Latent Action Supervision for Vision-Language-Action Models cs.RO · 2026-05-06 · unverdicted · none · ref 39
A unified comparison of latent action supervision strategies for VLA models reveals task-specific benefits, with image-based approaches aiding reasoning and generalization, action-based aiding motor control, and discrete tokens proving most effective.
FingerViP: Learning Real-World Dexterous Manipulation with Fingertip Visual Perception cs.RO · 2026-04-23 · conditional · none · ref 73
FingerViP equips each finger with a miniature camera and trains a multi-view diffusion policy that achieves 80.8% success on real-world dexterous tasks previously limited by wrist-camera occlusion.
ShapeGen: Robotic Data Generation for Category-Level Manipulation cs.RO · 2026-04-16 · unverdicted · none · ref 44
ShapeGen generates shape-diverse 3D robotic manipulation demonstrations without simulators by curating a functional shape library and applying a minimal-annotation pipeline for novel, physically plausible data.
Robotic Manipulation is Vision-to-Geometry Mapping ($f(v) \rightarrow G$): Vision-Geometry Backbones over Language and Video Models cs.RO · 2026-04-14 · unverdicted · none · ref 73
Vision-geometry backbones using pretrained 3D world models outperform vision-language and video models for robotic manipulation by enabling direct mapping from visual input to geometric actions.
A1: A Fully Transparent Open-Source, Adaptive and Efficient Truncated Vision-Language-Action Model cs.RO · 2026-04-07 · unverdicted · none · ref 50
A1 is a transparent VLA framework achieving state-of-the-art robot manipulation success with up to 72% lower latency via adaptive layer truncation and inter-layer flow matching.
Multi-View Video Diffusion Policy: A 3D Spatio-Temporal-Aware Video Action Model cs.RO · 2026-04-03 · conditional · none · ref 40
MV-VDP jointly predicts multi-view RGB and heatmap videos via diffusion to achieve data-efficient, robust robotic manipulation policies.
AttenA+: Rectifying Action Inequality in Robotic Foundation Models cs.RO · 2026-05-13 · unverdicted · none · ref 38
AttenA+ applies velocity-driven action attention to reweight training objectives toward kinematically critical low-velocity segments, yielding small benchmark gains on Libero and RoboTwin without added parameters.
X-Imitator: Spatial-Aware Imitation Learning via Bidirectional Action-Pose Interaction cs.RO · 2026-05-12 · unverdicted · none · ref 72
X-Imitator is a bidirectional action-pose interaction framework for spatial-aware imitation learning that outperforms vanilla policies and explicit pose guidance on 24 simulated and 3 real-world robotic tasks.
TAIL-Safe: Task-Agnostic Safety Monitoring for Imitation Learning Policies cs.RO · 2026-05-02 · unverdicted · none · ref 27 · 2 links
TAIL-Safe learns a Lipschitz Q-function from visibility, recognizability, and graspability criteria in a Gaussian Splatting twin to define an empirical safe set for IL policies and recovers unsafe actions via Nagumo-inspired gradient ascent.
StableIDM: Stabilizing Inverse Dynamics Model against Manipulator Truncation via Spatio-Temporal Refinement cs.RO · 2026-04-20 · unverdicted · none · ref 51
StableIDM stabilizes inverse dynamics models under manipulator truncation by combining robot-centric masking, directional spatial feature aggregation, and temporal dynamics refinement, yielding 12.1% higher strict action accuracy on AgiBot and 9.7-17.6% gains in real-robot tasks.
R3D: Revisiting 3D Policy Learning cs.CV · 2026-04-16 · unverdicted · none · ref 49
A transformer 3D encoder plus diffusion decoder architecture, with 3D-specific augmentations, outperforms prior 3D policy methods on manipulation benchmarks by improving training stability.
FastGrasp: Learning-based Whole-body Control method for Fast Dexterous Grasping with Mobile Manipulators cs.RO · 2026-04-14 · unverdicted · none · ref 10
FastGrasp uses two-stage RL with CVAE for diverse grasp candidates from point clouds and tactile sensing for impact adjustments to achieve robust fast whole-body grasping in sim and real-world settings.
Learning 3D Representations for Spatial Intelligence from Unposed Multi-View Images cs.CV · 2026-04-12 · unverdicted · none · ref 82
UniSplat learns consistent 3D geometry, appearance, and semantics from unposed images using dual masking, progressive Gaussian splatting, and recalibration to align predictions across tasks.
Robot Learning from Human Videos: A Survey cs.RO · 2026-04-30 · unverdicted · none · ref 14
The survey organizes human-video-based robot learning into task-, observation-, and action-oriented transfer pathways, reviews associated datasets, and outlines challenges for scalable embodied AI.
Redefining End-of-Life: Intelligent Automation for Electronics Remanufacturing Systems eess.SY · 2026-04-03 · unverdicted · none · ref 121
A literature review of intelligent automation approaches using robotics, AI, and control for disassembly, inspection, sorting, and reprocessing of end-of-life electronics.

3d diffusion policy: Generalizable visuomotor policy learning via simple 3d representations

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer