CronusVLA: Towards Efficient and Robust Manipulation via Multi-Frame Vision-Language-Action Modeling, October 2025

Li, H · 2025 · arXiv 2506.19816

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

citation-role summary

background 2 baseline 1

citation-polarity summary

background 2 baseline 1

representative citing papers

DSSP: Diffusion State Space Policy with Full-History Encoding

cs.RO · 2026-05-14 · conditional · novelty 7.0

DSSP is a history-conditioned diffusion state space policy that uses SSMs to encode full observation streams with an auxiliary dynamics objective and hierarchical fusion, achieving SOTA results with reduced model size in robot manipulation.

CF-VLA: Efficient Coarse-to-Fine Action Generation for Vision-Language-Action Policies

cs.CV · 2026-04-27 · unverdicted · novelty 7.0

CF-VLA uses a coarse initialization over endpoint velocity followed by single-step refinement to achieve strong performance with low inference steps on CALVIN, LIBERO, and real-robot tasks.

${\pi}_{0.7}$: a Steerable Generalist Robotic Foundation Model with Emergent Capabilities

cs.LG · 2026-04-16 · unverdicted · novelty 7.0

π₀.₇ is a steerable generalist robotic model that uses rich multimodal prompts including language, subgoal images, and performance metadata to achieve out-of-the-box generalization across tasks and robot bodies.

VLA-Hijack: A Transferable Patch Attack against Vision-Language-Action Models via Visual Proprioception Hijacking

cs.CV · 2026-05-27 · unverdicted · novelty 6.0

VLA-Hijack is a new adversarial patch attack on Vision-Language-Action models that suppresses real arm features and injects the patch as surrogate embodiment to achieve high cross-architecture transferability.

VLANeXt: Recipes for Building Strong VLA Models

cs.CV · 2026-02-20 · conditional · novelty 6.0

VLANeXt distills 12 design insights from a unified VLA study into a model that outperforms prior methods on LIBERO benchmarks while releasing code for further exploration.

MemoryVLA: Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation

cs.RO · 2025-08-26 · conditional · novelty 6.0

MemoryVLA introduces a perceptual-cognitive memory bank and working-memory retrieval mechanism into VLA models, raising success rates on long-horizon robotic tasks by up to 26 points over prior baselines.

S$^2$-VLA: State-Space Guided Vision-Language-Action Models for Long-Horizon Manipulation

cs.RO · 2026-06-26 · unverdicted · novelty 5.0

S²-VLA uses a state-space model to maintain a belief state that produces dynamic gating weights for fusing visual, language, and action features, claiming better long-horizon manipulation than 7B models with only 2B parameters.

Fast-dVLA: Accelerating Discrete Diffusion VLA to Real-Time Performance

cs.RO · 2026-03-26 · unverdicted · novelty 5.0

Parameter differences from two training runs on a small task set are treated as auxiliary capability vectors that are merged into a pretrained VLA model, yielding auxiliary-task gains at the cost of ordinary supervised finetuning plus a simple regularization term.

Causal World Modeling for Robot Control

cs.CV · 2026-01-29 · unverdicted · novelty 5.0

LingBot-VA combines video world modeling with policy learning via Mixture-of-Transformers, closed-loop rollouts, and asynchronous inference to improve robot manipulation in simulation and real settings.

LIBERO-PRO: Towards Robust and Fair Evaluation of Vision-Language-Action Models Beyond Memorization

cs.CV · 2025-10-04

citing papers explorer

Showing 10 of 10 citing papers.

DSSP: Diffusion State Space Policy with Full-History Encoding cs.RO · 2026-05-14 · conditional · none · ref 34
DSSP is a history-conditioned diffusion state space policy that uses SSMs to encode full observation streams with an auxiliary dynamics objective and hierarchical fusion, achieving SOTA results with reduced model size in robot manipulation.
CF-VLA: Efficient Coarse-to-Fine Action Generation for Vision-Language-Action Policies cs.CV · 2026-04-27 · unverdicted · none · ref 24
CF-VLA uses a coarse initialization over endpoint velocity followed by single-step refinement to achieve strong performance with low inference steps on CALVIN, LIBERO, and real-robot tasks.
${\pi}_{0.7}$: a Steerable Generalist Robotic Foundation Model with Emergent Capabilities cs.LG · 2026-04-16 · unverdicted · none · ref 34
π₀.₇ is a steerable generalist robotic model that uses rich multimodal prompts including language, subgoal images, and performance metadata to achieve out-of-the-box generalization across tasks and robot bodies.
VLA-Hijack: A Transferable Patch Attack against Vision-Language-Action Models via Visual Proprioception Hijacking cs.CV · 2026-05-27 · unverdicted · none · ref 12
VLA-Hijack is a new adversarial patch attack on Vision-Language-Action models that suppresses real arm features and injects the patch as surrogate embodiment to achieve high cross-architecture transferability.
VLANeXt: Recipes for Building Strong VLA Models cs.CV · 2026-02-20 · conditional · none · ref 22
VLANeXt distills 12 design insights from a unified VLA study into a model that outperforms prior methods on LIBERO benchmarks while releasing code for further exploration.
MemoryVLA: Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation cs.RO · 2025-08-26 · conditional · none · ref 15
MemoryVLA introduces a perceptual-cognitive memory bank and working-memory retrieval mechanism into VLA models, raising success rates on long-horizon robotic tasks by up to 26 points over prior baselines.
S$^2$-VLA: State-Space Guided Vision-Language-Action Models for Long-Horizon Manipulation cs.RO · 2026-06-26 · unverdicted · none · ref 44
S²-VLA uses a state-space model to maintain a belief state that produces dynamic gating weights for fusing visual, language, and action features, claiming better long-horizon manipulation than 7B models with only 2B parameters.
Fast-dVLA: Accelerating Discrete Diffusion VLA to Real-Time Performance cs.RO · 2026-03-26 · unverdicted · none · ref 12
Parameter differences from two training runs on a small task set are treated as auxiliary capability vectors that are merged into a pretrained VLA model, yielding auxiliary-task gains at the cost of ordinary supervised finetuning plus a simple regularization term.
Causal World Modeling for Robot Control cs.CV · 2026-01-29 · unverdicted · none · ref 37
LingBot-VA combines video world modeling with policy learning via Mixture-of-Transformers, closed-loop rollouts, and asynchronous inference to improve robot manipulation in simulation and real settings.
LIBERO-PRO: Towards Robust and Fair Evaluation of Vision-Language-Action Models Beyond Memorization cs.CV · 2025-10-04 · unreviewed · ref 12

CronusVLA: Towards Efficient and Robust Manipulation via Multi-Frame Vision-Language-Action Modeling, October 2025

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer