Saivla-0: Cerebrum–pons–cerebellum tripartite architecture for compute-aware vision-language-action.arXiv preprint arXiv:2603.08124, 2026

Xiang Shi, Wenlong Huang, Menglin Zou, Xinhai Sun · 2026 · arXiv 2603.08124

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

ABot-M0.5: Unified Mobility-and-Manipulation World Action Model

cs.CV · 2026-07-01 · unverdicted · novelty 6.0

ABot-M0.5 proposes a unified mobility-and-manipulation world action model using three alignment strategies that achieves state-of-the-art performance on mobile and fine-grained manipulation benchmarks.

RDGen: Demonstration Generation for High-Quality Robot Learning via Reinforcement Learning

cs.RO · 2026-05-29 · unverdicted · novelty 4.0

RDGen uses sim-to-real RL policies to generate smoother robot demonstrations that improve downstream VLA performance over human-collected data on pick-and-place tasks.

Evo-Depth: A Lightweight Depth-Enhanced Vision-Language-Action Model

cs.CV · 2026-05-14 · unverdicted · novelty 4.0

Evo-Depth is a compact VLA model using a lightweight implicit depth encoder from RGB views plus progressive alignment to boost manipulation performance without added hardware.

citing papers explorer

Showing 3 of 3 citing papers.

ABot-M0.5: Unified Mobility-and-Manipulation World Action Model cs.CV · 2026-07-01 · unverdicted · none · ref 58
ABot-M0.5 proposes a unified mobility-and-manipulation world action model using three alignment strategies that achieves state-of-the-art performance on mobile and fine-grained manipulation benchmarks.
RDGen: Demonstration Generation for High-Quality Robot Learning via Reinforcement Learning cs.RO · 2026-05-29 · unverdicted · none · ref 20
RDGen uses sim-to-real RL policies to generate smoother robot demonstrations that improve downstream VLA performance over human-collected data on pick-and-place tasks.
Evo-Depth: A Lightweight Depth-Enhanced Vision-Language-Action Model cs.CV · 2026-05-14 · unverdicted · none · ref 37
Evo-Depth is a compact VLA model using a lightweight implicit depth encoder from RGB views plus progressive alignment to boost manipulation performance without added hardware.

Saivla-0: Cerebrum–pons–cerebellum tripartite architecture for compute-aware vision-language-action.arXiv preprint arXiv:2603.08124, 2026

fields

years

verdicts

representative citing papers

citing papers explorer