{"total":15,"items":[{"citing_arxiv_id":"2607.01804","ref_index":11,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"VLA-Corrector: Lightweight Detect-and-Correct Inference for Adaptive Action Horizon","primary_cat":"cs.RO","submitted_at":"2026-07-02T07:18:53+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"VLA-Corrector adds a detect-and-correct inference layer using a latent vision monitor and online gradient guidance to enable adaptive action horizons in chunked VLA policies.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2607.01051","ref_index":15,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"AutoSpeed: Annotation-Free Stage-Adaptive Motion Speed Learning for Robot Manipulation","primary_cat":"cs.RO","submitted_at":"2026-07-01T15:13:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"AutoSpeed optimizes visuomotor policies over candidate trajectories at varying speeds using a composite cost of prediction error versus horizon length, with DCT-based modulation, yielding shorter execution times and higher success rates while producing speeds that align with task stages.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.12497","ref_index":26,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"$\\mu$VLA: On Recurrent Memory for Partially Observable Manipulation in VLA Models","primary_cat":"cs.LG","submitted_at":"2026-06-10T13:26:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Adding recurrent memory tokens to VLA models raises success rates on partially observable manipulation tasks from 0.42 to 0.84 on training and 0.07 to 0.23 on held-out tasks while preserving performance under full observability.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.12475","ref_index":24,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Learning to Assist: Collaborative VLAs for Implicit Human-Robot Collaboration","primary_cat":"cs.RO","submitted_at":"2026-06-10T05:42:49+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"VLA models with inference-time steering mitigate action leakage in implicit human-robot collaboration, supporting longer horizons and yielding faster, more reliable assembly than shorter-horizon baselines in a 16-person study.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.11408","ref_index":9,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Dynamic Execution Horizon Prediction for Chunk-based Robot Policies","primary_cat":"cs.RO","submitted_at":"2026-06-09T19:58:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"DEHP adds an online-RL horizon predictor to frozen chunk policies, yielding higher success on precise and long-horizon robot manipulation by adapting chunk length to task stage.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.06491","ref_index":36,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"TempoVLA: Learning Speed-Controllable Vision-Language-Action Policies","primary_cat":"cs.RO","submitted_at":"2026-06-04T17:59:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"TempoVLA learns a single VLA policy with controllable execution speed via variable-speed trajectory augmentation and explicit speed conditioning.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.04233","ref_index":58,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"What Are We Actually Benchmarking in Robot Manipulation?","primary_cat":"cs.RO","submitted_at":"2026-06-02T21:33:28+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LIBERO and CALVIN fail multiple proposed diagnostics for shortcut solvability, statistical significance, overfitting, and data dependence, while a tiny 0.09B probe reaches near-SOTA on LIBERO.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.03847","ref_index":15,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Denoising Tells When to Replan: Denoising-Variance Adaptive Chunking for Flow-Based Robot Policies","primary_cat":"cs.RO","submitted_at":"2026-06-02T16:26:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"DVAC uses denoising variance as an intrinsic signal to adaptively chunk actions in flow-based robot policies, improving success rates and cutting replans on LIBERO, RoboTwin, CALVIN, and real-world tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.00537","ref_index":11,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"PACE: Phase-Aware Chunk Execution for Robot Policies with Action Chunking","primary_cat":"cs.RO","submitted_at":"2026-05-30T05:11:06+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PACE dynamically selects execution horizons for action chunks in robot policies by detecting low-speed transition points in predicted speed profiles, raising success rates from 57.8% to 64.2% on 50 simulation tasks and from 50.7% to 70.4% in real-robot tests.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11567","ref_index":10,"ref_count":3,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Dynamic Execution Commitment of Vision-Language-Action Models","primary_cat":"cs.CV","submitted_at":"2026-05-12T05:52:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A3 reframes dynamic action chunk commitment in VLA models as self-speculative prefix verification, accepting the longest continuous sequence of actions that satisfies consensus-ordered conditional invariance and prefix-closed sequential consistency.","context_count":1,"top_context_role":"baseline","top_context_polarity":"baseline","context_text":"selected a priori per task [9]. This design implicitly assumes that the model's predictive reliability is uniform across the temporal rollout and invariant to environmental stochasticity. However, in open- ended embodied settings, observations are non-stationary and execution errors inevitably compound, rendering a fixed execution horizon substantially brittle [ 10, 11]. As illustrated in Figure 1, it reveals a fundamental trade-off between inference efficiency and task success rate under varying execution horizons: a smaller execution horizon requires more frequent model queries, imposing heavy inference overhead on real-time robot control, whereas a larger horizon reduces forward calls but commits to longer open-loop action sequences, resulting in severe success rate degradation as"},{"citing_arxiv_id":"2605.09860","ref_index":24,"ref_count":3,"confidence":0.9,"is_internal_anchor":true,"paper_title":"When to Re-Commit: Temporal Abstraction Discovery for Long-Horizon Vision-Language Reasoning","primary_cat":"cs.AI","submitted_at":"2026-05-11T01:43:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Learns state-conditioned commitment depth in a 7B vision-language policy that jointly predicts actions and replan intervals, outperforming fixed-depth baselines and larger models on Sliding Puzzle and Sokoban while providing a theoretical dominance result.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"James Tanner, Quan Vuong, Homer Walke, Anna Walling, Haohuan Wang, Lili Yu, and Ury Zhilinsky. π0.5: a vision-language-action model with open-world generalization, 2025. URL https://arxiv.org/abs/2504.16054. [23] Dong Jing, Gang Wang, Jiaqi Liu, Weiliang Tang, Zelong Sun, Yunchao Yao, Zhenyu Wei, Yunhui Liu, Zhiwu Lu, and Mingyu Ding. Mixture of horizons in action chunking, 2025. URL https://arxiv.org/abs/2511.19433. [24] Lydia Kavraki and J-C Latombe. Randomized preprocessing of configuration for fast path plan- ning. InProceedings of the 1994 IEEE International Conference on Robotics and Automation, pages 2138-2145. IEEE, 1994. [25] Moo Jin Kim, Chelsea Finn, and Percy Liang. Fine-tuning vision-language-action models: Optimizing speed and success.arXiv preprint arXiv:2502."},{"citing_arxiv_id":"2605.06222","ref_index":8,"ref_count":2,"confidence":0.9,"is_internal_anchor":true,"paper_title":"When to Trust Imagination: Adaptive Action Execution for World Action Models","primary_cat":"cs.RO","submitted_at":"2026-05-07T13:18:28+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A verifier called Future Forward Dynamics Causal Attention enables adaptive action execution in World Action Models, reducing model inferences by 69% and improving success rates in robotic tasks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"long action chunk can cause failure. Therefore, the key challenge is not merely choosing a better chunk size, but decidingwhen the WAM's imagined future should still be trusted during physical execution. Existing adaptive execution methods for diffusion policies or VLA models mainly adjust action chunk length based on action uncertainty, entropy, or policy-side confidence [9, 25, 15, 26, 27]. However, these methods do not exploit the defining property of W AMs: the model predicts not only what action to take, but also what future visual observations should occur if the action rollout remains valid. This creates a new form of self-verification. During execution, the robot can compare the real observation with the W AM-predicted observation at the corresponding timestep and jointly reason over them with"},{"citing_arxiv_id":"2604.24086","ref_index":34,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"AsyncShield: A Plug-and-Play Edge Adapter for Asynchronous Cloud-based VLA Navigation","primary_cat":"cs.RO","submitted_at":"2026-04-27T06:20:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"AsyncShield restores VLA geometric intent from latency via kinematic pose mapping and uses PPO-Lagrangian to balance tracking with LiDAR safety constraints in a plug-and-play module.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"To address the asynchronous latency of mobile-based large models, existing strategies predominantly focus on \"temporal fitting and prediction.\" For instance, VLASH [32] relies on forward-state prediction to estimate the robot's future pose; DuoCore-FS [33] attempts to con- struct a latent representation buffer to bridge the dual-track system, whereas AsyncVLA [34] concentrates on edge-side asynchronous adaptation manipulation via cloud-edge col- laboration. Simultaneously, within low-level local perception modules, traditional frameworks (e.g., NavFormer [35], V- STRONG [36]) and various reinforcement learning schemes (e.g., Hierarchical RL Nav [37], Enhanced PPO [38], De- centralized RL [39], APD for SRL [40]) exhibit excel-"},{"citing_arxiv_id":"2604.02965","ref_index":15,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Open-Loop Planning, Closed-Loop Verification: Speculative Verification for VLA","primary_cat":"cs.RO","submitted_at":"2026-04-03T10:55:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SV-VLA uses infrequent heavy VLA planning of action chunks plus a lightweight closed-loop verifier to achieve both efficiency and robustness in dynamic robot control.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Vision-Language-Action (VLA) models have recently emerged as a promising paradigm in embodied AI, by directly predicting ∗Yu Zhang is the corresponding author. action tokens from multimodal observations and language instruc- tions [12, 36]. By combining large-scale vision-language pretrain- ing with robot demonstration data, recent VLA models such as RT-2 [39], OpenVLA [15], and 𝜋0 [1, 9, 10] have shown strong zero- shot transfer ability and improved semantic grounding [34]. These properties make them promising for complex, multi-step manipu- lation tasks and open-world settings, which are difficult to handle with conventional task-specific policies trained on limited robot data [30, 38]. However, a key challenge in deploying VLA models"},{"citing_arxiv_id":"2601.21998","ref_index":30,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Causal World Modeling for Robot Control","primary_cat":"cs.CV","submitted_at":"2026-01-29T17:07:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"LingBot-VA combines video world modeling with policy learning via Mixture-of-Transformers, closed-loop rollouts, and asynchronous inference to improve robot manipulation in simulation and real settings.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Language Models (VLMs) as foundational backbones [6, 7, 11, 29, 34, 39, 87, 93], which provide superior cross-modal understanding and more generalizable action distributions compared to task-specific imitation policies like ACT [91] or Diffusion Policy [17]. Efforts have been further devoted to improving the deployability through lightweight backbones [49, 62, 67], efficient tokenization [57], real-time inference [8, 10, 70], or fine-tuning schemes [30, 32, 38]. However, despite their prowess in semantic reasoning, a fundamental limitation persists: the pre-training objectives and data distributions of standard VLMs largely overlook the fine-grained system dynamics and low-level trajectories essential for precision manipulation. While supervised fine-tuning on expensively collected large-scale robot datasets allows these"}],"limit":50,"offset":0}