arXiv preprint arXiv:2601.11404 (2026)

· 2026 · arXiv 2601.11404

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Revisiting Embodied Chain-of-Thought for Generalizable Robot Manipulation

cs.RO · 2026-06-02 · unverdicted · novelty 6.0

ERVLA trains on a 978k-trajectory embodied CoT corpus using reasoning as supervision with dropout, then predicts actions without CoT at test time, reaching 86.9% on LIBERO-Plus and 53.2% on VLABench.

Continuous Reasoning for Vision-Language-Action

cs.RO · 2026-05-29 · unverdicted · novelty 6.0

Continuous Reasoning for VLA introduces a shared Gaussian latent for continuous thoughts, trained with self-verification to improve action prediction on LIBERO-PRO and real robots.

GazeVLA: Learning Human Intention for Robotic Manipulation

cs.RO · 2026-04-24 · unverdicted · novelty 6.0

GazeVLA pretrains on large human egocentric datasets to capture gaze-based intention, then finetunes on limited robot data with chain-of-thought reasoning to achieve better robotic manipulation performance than baselines.

Coarse-to-Control: Action-Token Planning for Vision-Language-Action Models

cs.RO · 2026-06-05 · unverdicted · novelty 5.0

Coarse-to-Control adds planning via coarse action tokens in the same vocabulary as control actions, improving VLA performance on long-horizon manipulation tasks.

Dreaming when Necessary: Advancing World Action Models with Adaptive Multi-Modal Reasoning

cs.RO · 2026-06-05 · unverdicted · novelty 5.0

AdaWAM introduces an adaptive router that triggers textual or visual reasoning as needed in world action models, claiming better efficiency and performance than prior embodied policies on simulated and real tasks.

QuoVLA: Quotient Space for Vision-Language-Action Models

cs.CV · 2026-05-24 · unverdicted · novelty 5.0

QuoVLA introduces a quotient-space framework that compresses VLM latents into action-sufficient representations via quantization and dual-branch design for better VLA generalization.

IntentVLA: Short-Horizon Intent Modeling for Aliased Robot Manipulation

cs.RO · 2026-05-14 · unverdicted · novelty 5.0

IntentVLA conditions VLA chunk generation on a compact intent code from recent observations and introduces AliasBench to evaluate stability under short-horizon observation aliasing, reporting gains on multiple robot benchmarks.

Evo-Depth: A Lightweight Depth-Enhanced Vision-Language-Action Model

cs.CV · 2026-05-14 · unverdicted · novelty 4.0

Evo-Depth is a compact VLA model using a lightweight implicit depth encoder from RGB views plus progressive alignment to boost manipulation performance without added hardware.

citing papers explorer

Showing 2 of 2 citing papers after filters.

QuoVLA: Quotient Space for Vision-Language-Action Models cs.CV · 2026-05-24 · unverdicted · none · ref 36
QuoVLA introduces a quotient-space framework that compresses VLM latents into action-sufficient representations via quantization and dual-branch design for better VLA generalization.
Evo-Depth: A Lightweight Depth-Enhanced Vision-Language-Action Model cs.CV · 2026-05-14 · unverdicted · none · ref 60
Evo-Depth is a compact VLA model using a lightweight implicit depth encoder from RGB views plus progressive alignment to boost manipulation performance without added hardware.

arXiv preprint arXiv:2601.11404 (2026)

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer