BOKBO is the first conformal abstention method for K-sample VLA policies that supplies finite-sample distribution-free guarantees on executed violation rates, with global and Mondrian per-task variants.
hub Canonical reference
Robomonkey: Scaling test-time sampling and verifi- cation for vision-language-action models
Canonical reference. 100% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 12roles
background 5polarities
background 5representative citing papers
VeGAS improves MLLM-based embodied agents by sampling action ensembles and using a verifier trained on LLM-synthesized failure cases, yielding up to 36% relative gains on hard multi-object long-horizon tasks in Habitat and ALFRED.
UPS framework uses conformal prediction to calibrate VLM verifiers for choosing between high-confidence action execution, natural language task queries, or policy interventions, then applies residual learning from interventions to continually improve the base policy with minimal feedback.
A retrieve-then-steer method stores successful robot actions in memory and uses them to steer a frozen VLA's flow-matching sampler for better test-time reliability without parameter updates.
VLA-ATTC equips VLA models with adaptive test-time compute via an uncertainty clutch and relative action critic, cutting failure rates by over 50% on LIBERO-LONG.
FASTER models multi-candidate denoising as an MDP and trains a value function to filter actions early, delivering the performance of full sampling at lower cost in diffusion RL policies.
PDF improves VLA success rates on LIBERO and Atari by applying test-time perturbation learning with delayed feedback to correct trajectory overfitting and overconfidence.
A1 is a transparent VLA framework achieving state-of-the-art robot manipulation success with up to 72% lower latency via adaptive layer truncation and inter-layer flow matching.
Steerable VLAs trained on rich synthetic commands at subtask, motion, and pixel levels enable VLMs to steer robot behavior more effectively, outperforming prior hierarchical baselines on real-world manipulation and generalization tasks.
This is the first survey on vision-language-action models, providing a taxonomy across three lines, plus summaries of datasets, simulators, benchmarks, challenges, and future directions in embodied AI.
FORCE is a 3-stage RL fine-tuning method for VLA models that stabilizes Q-function via on-policy warm-up and filters high-value actions for updates, claiming 79% success rate gains and 32.5% faster training without human intervention.
Embodied reward models systematically over-reward unsafe, suboptimal, and shortcut robot behaviors due to training on successful data only, and modest inclusion of bad behavior data improves alignment with human preferences.
citing papers explorer
-
BOKBO (Best of K Bad Options): Calibrated Abstention for VLA Policies
BOKBO is the first conformal abstention method for K-sample VLA policies that supplies finite-sample distribution-free guarantees on executed violation rates, with global and Mondrian per-task variants.
-
Think Twice, Act Once: Verifier-Guided Action Selection For Embodied Agents
VeGAS improves MLLM-based embodied agents by sampling action ensembles and using a verifier trained on LLM-synthesized failure cases, yielding up to 36% relative gains on hard multi-object long-horizon tasks in Habitat and ALFRED.
-
When to Act, Ask, or Learn: Uncertainty-Aware Policy Steering
UPS framework uses conformal prediction to calibrate VLM verifiers for choosing between high-confidence action execution, natural language task queries, or policy interventions, then applies residual learning from interventions to continually improve the base policy with minimal feedback.
-
Retrieve-then-Steer: Online Success Memory for Test-Time Adaptation of Generative VLAs
A retrieve-then-steer method stores successful robot actions in memory and uses them to steer a frozen VLA's flow-matching sampler for better test-time reliability without parameter updates.
-
VLA-ATTC: Adaptive Test-Time Compute for VLA Models with Relative Action Critic Model
VLA-ATTC equips VLA models with adaptive test-time compute via an uncertainty clutch and relative action critic, cutting failure rates by over 50% on LIBERO-LONG.
-
FASTER: Value-Guided Sampling for Fast RL
FASTER models multi-candidate denoising as an MDP and trains a value function to filter actions early, delivering the performance of full sampling at lower cost in diffusion RL policies.
-
Test-Time Perturbation Learning with Delayed Feedback for Vision-Language-Action Models
PDF improves VLA success rates on LIBERO and Atari by applying test-time perturbation learning with delayed feedback to correct trajectory overfitting and overconfidence.
-
A1: A Fully Transparent Open-Source, Adaptive and Efficient Truncated Vision-Language-Action Model
A1 is a transparent VLA framework achieving state-of-the-art robot manipulation success with up to 72% lower latency via adaptive layer truncation and inter-layer flow matching.
-
Steerable Vision-Language-Action Policies for Embodied Reasoning and Hierarchical Control
Steerable VLAs trained on rich synthetic commands at subtask, motion, and pixel levels enable VLMs to steer robot behavior more effectively, outperforming prior hierarchical baselines on real-world manipulation and generalization tasks.
-
A Survey on Vision-Language-Action Models for Embodied AI
This is the first survey on vision-language-action models, providing a taxonomy across three lines, plus summaries of datasets, simulators, benchmarks, challenges, and future directions in embodied AI.
-
FORCE: Efficient VLA Reinforcement Fine-Tuning via Value-Calibrated Warm-up and Self-Distillation
FORCE is a 3-stage RL fine-tuning method for VLA models that stabilizes Q-function via on-policy warm-up and filters high-value actions for updates, claiming 79% success rate gains and 32.5% faster training without human intervention.
-
Position: Good Embodied Reward Models Need Bad Behavior Data
Embodied reward models systematically over-reward unsafe, suboptimal, and shortcut robot behaviors due to training on successful data only, and modest inclusion of bad behavior data improves alignment with human preferences.