SpatialVLA: Exploring Spatial Representa- tions for Visual-Language-Action Models

· 2025 · DOI 10.15607/rss.2025.xxi.011

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open at publisher browse 4 citing papers

representative citing papers

Direct Action-Head Injection of A Grounded 3D Point Unlocks Spatial and Task Generalization

cs.RO · 2026-06-26 · unverdicted · novelty 6.0

Direct 3D point grounding injected into the action head via a two-layer MLP and adaptive layer norm boosts VLA success rates by 32-46 points on spatial and task perturbations in LIBERO-PRO.

ACE-Ego-0: Unifying Egocentric Human and Robotic Data for VLA Pretraining

cs.RO · 2026-06-15 · unverdicted · novelty 6.0

ACE-Ego-0 is a VLA pretraining framework that turns egocentric human videos into robot-format pseudo-actions via a video-to-action pipeline and trains jointly with robot data under a reliability-aware objective.

$\mu$VLA: On Recurrent Memory for Partially Observable Manipulation in VLA Models

cs.LG · 2026-06-10 · unverdicted · novelty 6.0

Adding recurrent memory tokens to VLA models raises success rates on partially observable manipulation tasks from 0.42 to 0.84 on training and 0.07 to 0.23 on held-out tasks while preserving performance under full observability.

TBD-VLA: Temporal Block Diffusion Vision Language Action Model

cs.CV · 2026-06-05 · unverdicted · novelty 5.0

TBD-VLA partitions action sequences into temporal blocks, performs masked discrete diffusion within blocks, and autoregressive generation across blocks to unify parallel decoding with temporal coherence in discrete VLA models.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Direct Action-Head Injection of A Grounded 3D Point Unlocks Spatial and Task Generalization cs.RO · 2026-06-26 · unverdicted · none · ref 28
Direct 3D point grounding injected into the action head via a two-layer MLP and adaptive layer norm boosts VLA success rates by 32-46 points on spatial and task perturbations in LIBERO-PRO.
ACE-Ego-0: Unifying Egocentric Human and Robotic Data for VLA Pretraining cs.RO · 2026-06-15 · unverdicted · none · ref 23
ACE-Ego-0 is a VLA pretraining framework that turns egocentric human videos into robot-format pseudo-actions via a video-to-action pipeline and trains jointly with robot data under a reliability-aware objective.

SpatialVLA: Exploring Spatial Representa- tions for Visual-Language-Action Models

fields

years

verdicts

representative citing papers

citing papers explorer