hub Mixed citations

3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations

Yanjie Ze, Gu Zhang, Kangning Zhang, Chenyuan Hu, Muhan Wang, Huazhe Xu · 2024 · cs.RO · arXiv 2403.03954

Mixed citation behavior. Most common role is background (64%).

92 Pith papers citing it

Background 64% of classified citations

open full Pith review browse 92 citing papers arXiv PDF

abstract

Imitation learning provides an efficient way to teach robots dexterous skills; however, learning complex skills robustly and generalizablely usually consumes large amounts of human demonstrations. To tackle this challenging problem, we present 3D Diffusion Policy (DP3), a novel visual imitation learning approach that incorporates the power of 3D visual representations into diffusion policies, a class of conditional action generative models. The core design of DP3 is the utilization of a compact 3D visual representation, extracted from sparse point clouds with an efficient point encoder. In our experiments involving 72 simulation tasks, DP3 successfully handles most tasks with just 10 demonstrations and surpasses baselines with a 24.2% relative improvement. In 4 real robot tasks, DP3 demonstrates precise control with a high success rate of 85%, given only 40 demonstrations of each task, and shows excellent generalization abilities in diverse aspects, including space, viewpoint, appearance, and instance. Interestingly, in real robot experiments, DP3 rarely violates safety requirements, in contrast to baseline methods which frequently do, necessitating human intervention. Our extensive evaluation highlights the critical importance of 3D representations in real-world robot learning. Videos, code, and data are available on https://3d-diffusion-policy.github.io .

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 10 baseline 2 method 1 other 1

citation-polarity summary

background 9 baseline 2 unclear 2 use method 1

representative citing papers

Test-time Adversarial Takeover: A Real-time Hijacking Interface against Robotic Diffusion Policies

cs.RO · 2026-06-09 · unverdicted · novelty 8.0

TAKO demonstrates real-time adversarial takeover of robotic diffusion policies via reusable universal patches on visual inputs, achieving 100% success in steering attacker-chosen trajectories across multiple tasks, encoders, and diffusion methods.

DexCompose: Reusing Dexterous Policies for Multi-Task Manipulation with a Single Hand

cs.RO · 2026-06-26 · unverdicted · novelty 7.0

DexCompose achieves 77.4% average success on 16 composite dexterous tasks by using role-aware residual composition with explicit finger ownership to combine pretrained policies without destructive interference.

ForesightSafety-VLA: A Unified Diagnostic Safety Benchmark for Vision-Language-Action Models

cs.RO · 2026-06-25 · unverdicted · novelty 7.0 · 2 refs

ForesightSafety-VLA creates a diagnostic benchmark for VLA safety with taxonomy across physical, language, and visual risks, showing perception and structure variations cause more safety degradation than language changes in tested models.

One Body, Two Minds: Variable Autonomy Approach for a Co-embodied Robotic Hand

cs.RO · 2026-06-24 · unverdicted · novelty 7.0

A co-embodied wearable robotic hand uses variable autonomy with a visuomotor diffusion policy for grasping and head gestures for actuation, yielding 23.3% faster task times and 93.6% success in a 44-person study.

Learning to See While Learning to Act: Diffusion Models for Active Perception in Robot Imitation

cs.RO · 2026-06-22 · unverdicted · novelty 7.0

See2Act couples action denoising with viewpoint refinement in a diffusion-based imitation learning policy trained on keyframe-anchored camera poses, recovering informative views under occlusion and improving RLBench performance by up to 34% with zero-shot sim-to-real transfer.

Frequency-Aware Flow Matching for Continuous and Consistent Robotic Action Generation

cs.RO · 2026-06-18 · unverdicted · novelty 7.0

FAFM performs flow matching in the frequency domain using DCT on action sequences to produce continuous temporally consistent robotic actions with a Sobolev-style smoothness regularizer.

EvoScene-VLA: Evolving Scene Beliefs Inside the Action Decoder for Chunked Robot Control

cs.RO · 2026-05-21 · conditional · novelty 7.0

EvoScene-VLA maintains an action-updated scene prior across control chunks in VLA policies, raising success rates on RoboTwin tasks from 87.2% to 89.1% fixed and 86.1% to 88.5% randomized while outperforming baselines on a real robot.

Mobile UMI: Cross-View Diffusion Policy with Decoupled Kinematics for Mobile Manipulation

cs.RO · 2026-05-20 · conditional · novelty 7.0

A hardware-free dual-camera capture framework with ChArUco spatial unification and receding-horizon state alignment enables decoupled SE(3) manipulation and SE(2) base trajectories for diffusion policies, yielding 83.8% average success on four long-horizon household tasks.

DISC: Decoupling Instruction from State-Conditioned Control via Policy Generation

cs.RO · 2026-05-20 · unverdicted · novelty 7.0

A hypernetwork generates complete task-specific visuomotor policy parameters from instructions alone to structurally eliminate observation leakage in language-conditioned robotic control.

Resolving Long-Tail Ambiguity in Unsupervised 3D Point Cloud Segmentation with Language Priors

cs.CV · 2026-05-20 · unverdicted · novelty 7.0

LangTail uses entity-level semantic priors from language models aligned via contrastive learning in a hierarchical clustering setup to resolve long-tail ambiguity, yielding +13.5, +12.9, and +8.9 mIoU gains on ScanNet-v2, S3DIS, and nuScenes.

RoboFlow4D: A Lightweight Flow World Model Toward Real-Time Flow-Guided Robotic Manipulation

cs.RO · 2026-05-17 · unverdicted · novelty 7.0

RoboFlow4D is an end-to-end lightweight flow world model that predicts multi-frame 3D flows from visual observations and textual instructions to provide explicit planning for real-time robotic manipulation.

SkiP: When to Skip and When to Refine for Efficient Robot Manipulation

cs.RO · 2026-05-15 · unverdicted · novelty 7.0

SkiP introduces action relabeling and Motion Spectrum Keying to skip redundant steps in robot trajectories, cutting executed steps by 15-40% while maintaining success rates across 72 simulated and 3 real tasks.

DSSP: Diffusion State Space Policy with Full-History Encoding

cs.RO · 2026-05-14 · conditional · novelty 7.0

DSSP is a history-conditioned diffusion state space policy that uses SSMs to encode full observation streams with an auxiliary dynamics objective and hierarchical fusion, achieving SOTA results with reduced model size in robot manipulation.

Test-time Sparsity for Extreme Fast Action Diffusion

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

Test-time sparsity with a parallel pipeline and omnidirectional feature reuse accelerates action diffusion by 5x to 47.5 Hz while cutting FLOPs 92% with no performance loss.

VistaBot: View-Robust Robot Manipulation via Spatiotemporal-Aware View Synthesis

cs.RO · 2026-04-23 · unverdicted · novelty 7.0

VistaBot integrates 4D geometry estimation and spatiotemporal view synthesis into action policies to improve cross-view generalization by 2.6-2.8x on a new VGS metric in simulation and real tasks.

BiCoord: A Bimanual Manipulation Benchmark towards Long-Horizon Spatial-Temporal Coordination

cs.RO · 2026-04-07 · conditional · novelty 7.0

BiCoord is a new benchmark for long-horizon tightly coordinated bimanual manipulation that includes quantitative metrics and shows existing policies like DP, RDT, Pi0 and OpenVLA-OFT struggle on such tasks.

Diffusion Policy with Bayesian Expert Selection for Active Multi-Target Tracking

cs.RO · 2026-04-03 · unverdicted · novelty 7.0

A Bayesian expert selection framework with variational Bayesian last layers and lower confidence bounds improves diffusion policies for active multi-target tracking.

Generative Control as Optimization: Time Unconditional Flow Matching for Adaptive and Robust Robotic Control

cs.RO · 2026-03-18 · conditional · novelty 7.0

GeCO replaces time-dependent flow matching with time-unconditional optimization, enabling adaptive inference and intrinsic OOD detection for robotic imitation learning.

Information Filtering via Variational Regularization for Robot Manipulation

cs.RO · 2026-01-29 · unverdicted · novelty 7.0

Variational Regularization imposes an adaptive information bottleneck on noisy intermediate features in DP3-UNet and DP3-DiT policies, consistently raising task success rates on RoboTwin2.0, Adroit, and MetaWorld while achieving new state-of-the-art results.

TouchGuide: Inference-Time Steering of Visuomotor Policies via Touch Guidance

cs.RO · 2026-01-28 · unverdicted · novelty 7.0

TouchGuide improves contact-rich robot manipulation by steering diffusion or flow-matching visuomotor policies with tactile feasibility scores from a contrastively trained Contact Physical Model.

Steering Your Diffusion Policy with Latent Space Reinforcement Learning

cs.RO · 2025-06-18 · unverdicted · novelty 7.0

DSRL steers pretrained diffusion policies for robotics by applying RL to their latent noise inputs, achieving sample-efficient real-world adaptation with only black-box access.

Chronos: A Physics-Informed Full-History Framework for Non-Markovian Long-Horizon Manipulation

cs.RO · 2026-06-29 · unverdicted · novelty 6.0

Chronos elevates full observation history to the policy's latent state via selective SSM tokens and a Schrödinger-inspired acceleration bridge, achieving large gains on memory-dependent robot tasks with fewer parameters.

Stage-Aware and Roughness-Constrained Diffusion Policy for Multi-Stage Robotic Polishing

cs.RO · 2026-06-24 · unverdicted · novelty 6.0

SRDP infers process stages from multimodal observations to condition diffusion action generation and applies roughness-oriented constraints on process parameters for multi-stage robotic polishing.

ARP: Enhancing Quantized Skill Abstractions via Visual Alignment and Iterative Refinement for Robotic Manipulation

cs.RO · 2026-06-21 · unverdicted · novelty 6.0

ARP enhances quantized skill abstractions in imitation learning by coupling visual grounding via contrastive alignment with execution refinement via IRH, reporting SOTA results on LIBERO, Meta-World, and real-robot tasks.

citing papers explorer

Showing 2 of 2 citing papers after filters.

PALM: Progress-Aware Policy Learning via Affordance Reasoning for Long-Horizon Robotic Manipulation cs.RO · 2026-01-11 · unverdicted · none · ref 137 · internal anchor
PALM improves long-horizon robotic manipulation success by distilling affordance representations for object interaction and predicting within-subtask progress in a VLA model.
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model cs.CV · 2025-03-13 · unverdicted · none · ref 22 · internal anchor
HybridVLA unifies diffusion and autoregression in a single VLA model via collaborative training and ensemble to raise robot manipulation success rates by 14% in simulation and 19% in real-world tasks.

3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer