A new speculative inference system speeds up diffusion VLAs to 19.1 ms average latency (3.04x faster) on LIBERO by replacing most full 58 ms inferences with 7.8 ms draft rounds while preserving task performance.
hub
Real-time execution of action chunking flow policies
17 Pith papers cite this work. Polarity classification is still indexing.
hub tools
years
2026 17representative citing papers
RIO introduces a lightweight open-source framework that abstracts real-time robot I/O to support easy switching between embodiments and platforms for collecting data and deploying VLAs.
Discrete diffusion policies support native asynchronous execution via unmasking for real-time chunking, delivering higher success rates and 0.7x inference cost versus flow-matching RTC on dynamic robotics benchmarks and real pick tasks.
VLA models exhibit a compute-bound VLM phase followed by a memory-bound action phase on edge hardware; DP-Cache and V-AEFusion reduce redundancy and enable pipeline parallelism for up to 6x speedup on NPUs with marginal task degradation.
π₀.₇ is a steerable generalist robotic model that uses rich multimodal prompts including language, subgoal images, and performance metadata to achieve out-of-the-box generalization across tasks and robot bodies.
GuidedVLA improves VLA success rates by manually supervising separate attention heads in the action decoder with auxiliary signals for task-relevant factors.
Pace-and-Path Correction is a closed-form inference-time operator that decomposes a quadratic cost minimization into orthogonal pace compression and path offset channels to correct dynamics-blindness in chunked-action VLA models.
ACH lets RL policies dynamically pick action chunk lengths by jointly estimating Q-values for all candidate lengths via a single Transformer pass.
MotuBrain jointly models video and action via a three-stream Mixture-of-Transformers UniDiffuser to reach 95.8-96.1% success on RoboTwin 2.0 benchmarks, top EWMScore, and fast 11 Hz inference while adapting to new robots with 50-100 trajectories.
X-WAM unifies robotic action execution and 4D world synthesis by adapting video diffusion priors with a lightweight depth branch and asynchronous noise sampling, achieving 79-91% success on robot benchmarks.
AsyncShield restores VLA geometric intent from latency via kinematic pose mapping and uses PPO-Lagrangian to balance tracking with LiDAR safety constraints in a plug-and-play module.
Tube Diffusion Policy learns observation-conditioned feedback flows around nominal action chunks to enable fast reactive control in visual-tactile contact-rich manipulation.
A latent diffusion model with consistency distillation generates real-time instrumental accompaniment from live context audio, integrated with MAX/MSP for feasible human-AI co-performance.
MV-VDP jointly predicts multi-view RGB and heatmap videos via diffusion to achieve data-efficient, robust robotic manipulation policies.
Controlled benchmarks show per-step residual correction (A2C2) as most effective for VLA asynchronous inference up to d=8 delays on Kinetix with over 90% solve rate, outperforming inpainting and conditioning while training-based simulation is most robust.
LingBot-VA combines video world modeling with policy learning via Mixture-of-Transformers, closed-loop rollouts, and asynchronous inference to improve robot manipulation in simulation and real settings.
Embodied AI requires treating privacy as a lifecycle architectural constraint rather than a stage-local feature, addressed via the proposed SPINE framework with a multi-criterion privacy classification matrix.
citing papers explorer
-
Towards Real-Time Human-AI Musical Co-Performance: Accompaniment Generation with Latent Diffusion Models and MAX/MSP
A latent diffusion model with consistency distillation generates real-time instrumental accompaniment from live context audio, integrated with MAX/MSP for feasible human-AI co-performance.