hub

arXiv preprint arXiv:2503.20384 (2025)

· 2025 · arXiv 2503.20384

14 Pith papers cite this work. Polarity classification is still indexing.

14 Pith papers citing it

read on arXiv browse 14 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

Revisiting Parameter Redundancy in Vision-Language-Action Models: Insights from VLM-to-VLA Adaptation

cs.RO · 2026-06-30 · unverdicted · novelty 7.0

VLA models from VLM adaptation can be pruned 12-30% via multi-module joint scheme based on divergence signals while keeping ~90% performance on LIBERO without post-pruning recovery, unlike standard criteria that collapse.

HeiSD: Hybrid Speculative Decoding for Embodied Vision-Language-Action Models with Kinematic Awareness

cs.RO · 2026-03-18 · unverdicted · novelty 7.0

HeiSD delivers up to 2.45x faster inference for embodied VLA models by hybridizing speculative decoding with kinematic boundary detection and error-mitigation tricks while preserving task success rates.

KERV: Kinematic-Rectified Speculative Decoding for Embodied VLA Models

cs.RO · 2026-03-02 · unverdicted · novelty 7.0

KERV integrates kinematic Kalman Filter predictions with speculative decoding in VLA models to achieve 27-37% faster inference while maintaining nearly the same task success rates.

QuantVLA: Scale-Calibrated Post-Training Quantization for Vision-Language-Action Models

cs.LG · 2026-02-23 · unverdicted · novelty 7.0

QuantVLA is the first post-training quantization framework for VLA models that quantizes the diffusion transformer action head and reports higher task success rates than full-precision baselines with roughly 70% memory savings on the quantized components.

ConsisVLA-4D: Advancing Spatiotemporal Consistency in Efficient 3D-Perception and 4D-Reasoning for Robotic Manipulation

cs.RO · 2026-05-06 · unverdicted · novelty 6.0

ConsisVLA-4D adds cross-view semantic alignment, cross-object geometric fusion, and cross-scene dynamic reasoning to VLA models, delivering 21.6% and 41.5% gains plus 2.3x and 2.4x speedups on LIBERO and real-world tasks.

A1: A Fully Transparent Open-Source, Adaptive and Efficient Truncated Vision-Language-Action Model

cs.RO · 2026-04-07 · unverdicted · novelty 6.0

A1 is a transparent VLA framework achieving state-of-the-art robot manipulation success with up to 72% lower latency via adaptive layer truncation and inter-layer flow matching.

OxyGen: Unified KV Cache Management for VLA Inference under Multi-Task Parallelism

cs.RO · 2026-03-15 · unverdicted · novelty 6.0

OxyGen unifies KV cache management in MoT VLAs to enable cross-task KV sharing and cross-frame continuous batching, delivering up to 3.7x speedup with 200+ tokens/s language and 70 Hz action on on-device platforms.

Continually Evolving Skill Knowledge in Vision Language Action Model

cs.RO · 2025-11-22 · unverdicted · novelty 6.0

Stellar VLA achieves continual learning in VLA models by maintaining a growing knowledge space and routing tasks to specialized experts conditioned on semantic relations, delivering strong LIBERO benchmark results with only 1% data replay and successful real-world transfer on dual-arm hardware.

ActDistill: General Action-Guided Self-Derived Distillation for Efficient Vision-Language-Action Models

cs.CV · 2025-11-22 · conditional · novelty 6.0

ActDistill transfers action knowledge from heavy VLA teacher models to lightweight students via graph-encapsulated hierarchies and action-guided dynamic routing, delivering over 50% computation reduction and 1.67x speedup with comparable or better performance on embodied tasks.

World Pilot: Steering Vision-Language-Action Models with World-Action Priors

cs.RO · 2026-06-10 · unverdicted · novelty 5.0

World Pilot augments VLA policies with world-action priors through latent and action steering pathways, reporting 84.7% success on LIBERO-Plus zero-shot OOD and top real-robot results across four tasks.

SAFE-Pruner: Semantic Attention-Guided Future-Aware Token Pruning for Efficient Vision-Language-Action Manipulation

cs.CV · 2026-05-28 · unverdicted · novelty 5.0

SAFE-Pruner forecasts deep-layer token saliency in VLA models via semantic attention consistency and adaptive subtask detection to achieve up to 1.89x speedup with under 1.7% success rate loss.

ElegantVLA: Learning When to Think for Efficient Vision-Language-Action Models

cs.RO · 2026-05-28 · unverdicted · novelty 5.0

ElegantVLA accelerates VLA models up to 3.77x by dynamically scheduling compute across vision, language, and action components without retraining the base model.

RoboECC: Multi-Factor-Aware Edge-Cloud Collaborative Deployment for VLA Models

cs.DC · 2026-03-21 · unverdicted · novelty 5.0

RoboECC delivers up to 3.28x speedup for VLA model inference via co-aware segmentation and network-aware adjustment with 2.55-2.62% overhead.

Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey

cs.RO · 2025-08-18 · unverdicted · novelty 5.0

This survey organizes large VLM-based VLA models for robotic manipulation into monolithic and hierarchical paradigms, reviews their integrations and datasets, and outlines future directions.

citing papers explorer

Showing 14 of 14 citing papers.

Revisiting Parameter Redundancy in Vision-Language-Action Models: Insights from VLM-to-VLA Adaptation cs.RO · 2026-06-30 · unverdicted · none · ref 38
VLA models from VLM adaptation can be pruned 12-30% via multi-module joint scheme based on divergence signals while keeping ~90% performance on LIBERO without post-pruning recovery, unlike standard criteria that collapse.
HeiSD: Hybrid Speculative Decoding for Embodied Vision-Language-Action Models with Kinematic Awareness cs.RO · 2026-03-18 · unverdicted · none · ref 43
HeiSD delivers up to 2.45x faster inference for embodied VLA models by hybridizing speculative decoding with kinematic boundary detection and error-mitigation tricks while preserving task success rates.
KERV: Kinematic-Rectified Speculative Decoding for Embodied VLA Models cs.RO · 2026-03-02 · unverdicted · none · ref 37
KERV integrates kinematic Kalman Filter predictions with speculative decoding in VLA models to achieve 27-37% faster inference while maintaining nearly the same task success rates.
QuantVLA: Scale-Calibrated Post-Training Quantization for Vision-Language-Action Models cs.LG · 2026-02-23 · unverdicted · none · ref 48
QuantVLA is the first post-training quantization framework for VLA models that quantizes the diffusion transformer action head and reports higher task success rates than full-precision baselines with roughly 70% memory savings on the quantized components.
ConsisVLA-4D: Advancing Spatiotemporal Consistency in Efficient 3D-Perception and 4D-Reasoning for Robotic Manipulation cs.RO · 2026-05-06 · unverdicted · none · ref 85
ConsisVLA-4D adds cross-view semantic alignment, cross-object geometric fusion, and cross-scene dynamic reasoning to VLA models, delivering 21.6% and 41.5% gains plus 2.3x and 2.4x speedups on LIBERO and real-world tasks.
A1: A Fully Transparent Open-Source, Adaptive and Efficient Truncated Vision-Language-Action Model cs.RO · 2026-04-07 · unverdicted · none · ref 55
A1 is a transparent VLA framework achieving state-of-the-art robot manipulation success with up to 72% lower latency via adaptive layer truncation and inter-layer flow matching.
OxyGen: Unified KV Cache Management for VLA Inference under Multi-Task Parallelism cs.RO · 2026-03-15 · unverdicted · none · ref 50
OxyGen unifies KV cache management in MoT VLAs to enable cross-task KV sharing and cross-frame continuous batching, delivering up to 3.7x speedup with 200+ tokens/s language and 70 Hz action on on-device platforms.
Continually Evolving Skill Knowledge in Vision Language Action Model cs.RO · 2025-11-22 · unverdicted · none · ref 48
Stellar VLA achieves continual learning in VLA models by maintaining a growing knowledge space and routing tasks to specialized experts conditioned on semantic relations, delivering strong LIBERO benchmark results with only 1% data replay and successful real-world transfer on dual-arm hardware.
ActDistill: General Action-Guided Self-Derived Distillation for Efficient Vision-Language-Action Models cs.CV · 2025-11-22 · conditional · none · ref 31
ActDistill transfers action knowledge from heavy VLA teacher models to lightweight students via graph-encapsulated hierarchies and action-guided dynamic routing, delivering over 50% computation reduction and 1.67x speedup with comparable or better performance on embodied tasks.
World Pilot: Steering Vision-Language-Action Models with World-Action Priors cs.RO · 2026-06-10 · unverdicted · none · ref 14
World Pilot augments VLA policies with world-action priors through latent and action steering pathways, reporting 84.7% success on LIBERO-Plus zero-shot OOD and top real-robot results across four tasks.
SAFE-Pruner: Semantic Attention-Guided Future-Aware Token Pruning for Efficient Vision-Language-Action Manipulation cs.CV · 2026-05-28 · unverdicted · none · ref 48
SAFE-Pruner forecasts deep-layer token saliency in VLA models via semantic attention consistency and adaptive subtask detection to achieve up to 1.89x speedup with under 1.7% success rate loss.
ElegantVLA: Learning When to Think for Efficient Vision-Language-Action Models cs.RO · 2026-05-28 · unverdicted · none · ref 36
ElegantVLA accelerates VLA models up to 3.77x by dynamically scheduling compute across vision, language, and action components without retraining the base model.
RoboECC: Multi-Factor-Aware Edge-Cloud Collaborative Deployment for VLA Models cs.DC · 2026-03-21 · unverdicted · none · ref 10
RoboECC delivers up to 3.28x speedup for VLA model inference via co-aware segmentation and network-aware adjustment with 2.55-2.62% overhead.
Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey cs.RO · 2025-08-18 · unverdicted · none · ref 114
This survey organizes large VLM-based VLA models for robotic manipulation into monolithic and hierarchical paradigms, reviews their integrations and datasets, and outlines future directions.

arXiv preprint arXiv:2503.20384 (2025)

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer