Hume: Introducing system-2 thinking in visual-language-action model

· 2025 · arXiv 2505.21432

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

representative citing papers

From Imagined Futures to Executable Actions: Mixture of Latent Actions for Robot Manipulation

cs.RO · 2026-05-12 · unverdicted · novelty 7.0

MoLA infers a mixture of latent actions from generated future videos via modality-aware inverse dynamics models to improve robot manipulation policies.

Hydra-DP3: Frequency-Aware Right-Sizing of 3D Diffusion Policies for Visuomotor Control

cs.RO · 2026-05-02 · unverdicted · novelty 6.0

Hydra-DP3 achieves SOTA visuomotor performance with under 1% of prior 3D diffusion policy parameters by using frequency analysis to justify a lightweight decoder and two-step DDIM inference.

VLA-ATTC: Adaptive Test-Time Compute for VLA Models with Relative Action Critic Model

cs.RO · 2026-05-02 · unverdicted · novelty 6.0

VLA-ATTC equips VLA models with adaptive test-time compute via an uncertainty clutch and relative action critic, cutting failure rates by over 50% on LIBERO-LONG.

Sentinel-VLA: A Metacognitive VLA Model with Active Status Monitoring for Dynamic Reasoning and Error Recovery

cs.RO · 2026-05-02 · unverdicted · novelty 6.0

Sentinel-VLA introduces a metacognitive VLA model with a sentinel module for real-time status monitoring, dynamic reasoning, and error recovery, plus a self-evolving continual learning method, raising real-world task success by over 30% versus prior SOTA.

Adaptive Action Chunking at Inference-time for Vision-Language-Action Models

cs.RO · 2026-04-05 · unverdicted · novelty 6.0

Adaptive Action Chunking uses action entropy to dynamically adjust chunk sizes in VLA models, improving performance on simulated and real robotic manipulation tasks.

UAV-Track VLA: Embodied Aerial Tracking via Vision-Language-Action Models

cs.CV · 2026-04-02 · conditional · novelty 6.0

UAV-Track VLA modifies the π0.5 VLA architecture with temporal compression and dual-branch decoding to reach 61.76% success and 269.65 average frames in long-distance pedestrian tracking on a new 890K-frame UAV dataset, while cutting inference latency by 33.4%.

Deep Image Clustering Based on Curriculum Learning and Density Information

cs.CV · 2026-03-31 · unverdicted · novelty 6.0

IDCL adds density-based curriculum learning and density-core guidance to deep image clustering, claiming superior robustness, faster convergence, and flexibility on benchmark datasets.

InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy

cs.RO · 2025-10-15 · unverdicted · novelty 6.0

InternVLA-M1 uses spatially guided pre-training on 2.3M examples followed by action post-training to deliver up to 17% gains on robot manipulation benchmarks and 20.6% on unseen objects.

citing papers explorer

Showing 7 of 7 citing papers after filters.

From Imagined Futures to Executable Actions: Mixture of Latent Actions for Robot Manipulation cs.RO · 2026-05-12 · unverdicted · none · ref 43
MoLA infers a mixture of latent actions from generated future videos via modality-aware inverse dynamics models to improve robot manipulation policies.
Hydra-DP3: Frequency-Aware Right-Sizing of 3D Diffusion Policies for Visuomotor Control cs.RO · 2026-05-02 · unverdicted · none · ref 29
Hydra-DP3 achieves SOTA visuomotor performance with under 1% of prior 3D diffusion policy parameters by using frequency analysis to justify a lightweight decoder and two-step DDIM inference.
VLA-ATTC: Adaptive Test-Time Compute for VLA Models with Relative Action Critic Model cs.RO · 2026-05-02 · unverdicted · none · ref 20
VLA-ATTC equips VLA models with adaptive test-time compute via an uncertainty clutch and relative action critic, cutting failure rates by over 50% on LIBERO-LONG.
Sentinel-VLA: A Metacognitive VLA Model with Active Status Monitoring for Dynamic Reasoning and Error Recovery cs.RO · 2026-05-02 · unverdicted · none · ref 13
Sentinel-VLA introduces a metacognitive VLA model with a sentinel module for real-time status monitoring, dynamic reasoning, and error recovery, plus a self-evolving continual learning method, raising real-world task success by over 30% versus prior SOTA.
Adaptive Action Chunking at Inference-time for Vision-Language-Action Models cs.RO · 2026-04-05 · unverdicted · none · ref 36
Adaptive Action Chunking uses action entropy to dynamically adjust chunk sizes in VLA models, improving performance on simulated and real robotic manipulation tasks.
UAV-Track VLA: Embodied Aerial Tracking via Vision-Language-Action Models cs.CV · 2026-04-02 · conditional · none · ref 31
UAV-Track VLA modifies the π0.5 VLA architecture with temporal compression and dual-branch decoding to reach 61.76% success and 269.65 average frames in long-distance pedestrian tracking on a new 890K-frame UAV dataset, while cutting inference latency by 33.4%.
Deep Image Clustering Based on Curriculum Learning and Density Information cs.CV · 2026-03-31 · unverdicted · none · ref 74
IDCL adds density-based curriculum learning and density-core guidance to deep image clustering, claiming superior robustness, faster convergence, and flexibility on benchmark datasets.

Hume: Introducing system-2 thinking in visual-language-action model

fields

years

verdicts

representative citing papers

citing papers explorer