Recognition: 2 theorem links
· Lean TheoremFailing Forward: Adaptive Failure-Informed Learning for Vision-Language-Action Models
Pith reviewed 2026-05-13 06:33 UTC · model grok-4.3
The pith
Training VLA models on online failure trajectories as negative guidance boosts robotic manipulation success rates.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AFIL generates failure rollouts online using a pretrained VLA, then jointly trains dual action generators for successful and failed behaviors sharing a vision-language backbone. During inference, the failure generator provides adaptive negative guidance scaled by the distance between success and failure distributions at each step, steering actions toward reliable success modes.
What carries the argument
Dual Action Generators (DAGs) trained on success and failure trajectories, with adaptive guidance strength based on per-step distribution distance between the two.
Load-bearing premise
The failure trajectories produced online by the pretrained VLA are informative enough to provide useful negative guidance without introducing harmful biases.
What would settle it
Running the same robotic tasks with and without AFIL and finding no consistent improvement in success rates or robustness.
Figures
read the original abstract
Vision-language-action (VLA) models provide a promising paradigm for scalable robotic manipulation, yet their reliance on success-only behavioral cloning leaves them brittle; lacking corrective training signals, minor execution errors rapidly compound into unrecoverable, out-of-distribution failures. To address this limitation, we propose Adaptive Failure-Informed Learning (AFIL), an end-to-end framework that leverages failure trajectories as adaptive negative guidance for diffusion- and flow-based VLA policies. AFIL uses a pretrained VLA to generate failure rollouts online, avoiding the need for handcrafted failure-mode design or human-in-the-loop recovery. It then jointly trains Dual Action Generators (DAGs) for successful and failed behaviors while sharing a common vision-language backbone, enabling efficient failure-aware policy learning with limited parameter overhead. During sampling, the failure generator adaptively steers action generation away from failure-prone regions and toward more reliable success modes, with guidance strength determined by the per-diffusion-step distance between success and failure distributions. Experiments across in-domain and out-of-domain robotic manipulation tasks, covering both short- and long-horizon settings, show that AFIL consistently improves task success rates and robustness over existing VLA baselines, demonstrating its effectiveness, efficiency, and generality.
Editorial analysis
A structured set of objections, weighed in public.
Axiom & Free-Parameter Ledger
free parameters (1)
- guidance strength schedule
axioms (1)
- domain assumption Failure trajectories generated by the current policy are representative of the failure modes that matter at deployment.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquationwashburn_uniqueness_aczel unclearAFIL uses a pretrained VLA to generate failure rollouts online... jointly trains Dual Action Generators (DAGs) for successful and failed behaviors... guidance strength determined by the per-diffusion-step distance between success and failure distributions
-
IndisputableMonolith/Foundation/AbsoluteFloorClosureabsolute_floor_iff_bare_distinguishability unclearAdaptive failure-informed sampling... ϵ∗_FI = ϵ_succ − λ̂_η (ϵ_fail − ϵ_succ) with λ̂_η ∝ cosine distance
Reference graph
Works this paper leans on
-
[1]
Conference on Robot Learning , year=
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control , author=. Conference on Robot Learning , year=
-
[2]
Proceedings of Robotics: Science and Systems , year =
Octo: An Open-Source Generalist Robot Policy , author =. Proceedings of Robotics: Science and Systems , year =
-
[3]
Moo Jin Kim and Karl Pertsch and Siddharth Karamcheti and Ted Xiao and Ashwin Balakrishna and Suraj Nair and Rafael Rafailov and Ethan P Foster and Pannag R Sanketi and Quan Vuong and Thomas Kollar and Benjamin Burchfiel and Russ Tedrake and Dorsa Sadigh and Sergey Levine and Percy Liang and Chelsea Finn , booktitle=. Open
-
[4]
Advances in Neural Information Processing Systems , year=
Libero: Benchmarking Knowledge Transfer for Lifelong Robot Learning , author=. Advances in Neural Information Processing Systems , year=
-
[5]
arXiv preprint arXiv:2509.18953 , year=
Eva-VLA: Evaluating Vision-Language-Action Models' Robustness Under Real-World Physical Variations , author=. arXiv preprint arXiv:2509.18953 , year=
-
[6]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =
-
[7]
Annual Conference on Robot Learning , year=
Robotic Control via Embodied Chain-of-Thought Reasoning , author=. Annual Conference on Robot Learning , year=
-
[8]
Qiao Gu and Yuanliang Ju and Shengxiang Sun and Igor Gilitschenski and Haruki Nishimura and Masha Itkina and Florian Shkurti , booktitle=
-
[9]
9th Annual Conference on Robot Learning , year=
Uncertainty-Aware Latent Safety Filters for Avoiding Out-of-Distribution Failures , author=. 9th Annual Conference on Robot Learning , year=
-
[10]
Workshop on Out-of-Distribution Generalization in Robotics at RSS , year=
Identifying Precursors to Failures in Robotic Lift-and-Place Tasks , author=. Workshop on Out-of-Distribution Generalization in Robotics at RSS , year=
-
[11]
arXiv preprint arXiv:2403.12910 , year=
Yell At Your Robot: Improving On-the-Fly from Language Corrections , author=. arXiv preprint arXiv:2403.12910 , year=
-
[12]
arXiv preprint arXiv:2412.12602 , year=
Don't Yell at Your Robot: Physical Correction as the Collaborative Interface for Language Model Powered Robots , author=. arXiv preprint arXiv:2412.12602 , year=
-
[13]
Workshop on Out-of-Distribution Generalization in Robotics at RSS , year=
Fail2Progress: Learning from Failures with Stein Variational Inference for Robot Manipulation , author=. Workshop on Out-of-Distribution Generalization in Robotics at RSS , year=
-
[14]
Workshop on Out-of-Distribution Generalization in Robotics at RSS , year=
From Mystery to Mastery: Failure Diagnosis for Improving Manipulation Policies , author=. Workshop on Out-of-Distribution Generalization in Robotics at RSS , year=
-
[15]
Black, Kevin and Brown, Noah and Driess, Danny and Esmail, Adnan and Equi, Michael and Finn, Chelsea and Fusai, Niccolo and Groom, Lachy and Hausman, Karol and Ichter, Brian and others , journal=. _
-
[16]
Intelligence, Physical and Amin, Ali and Aniceto, Raichelle and Balakrishna, Ashwin and Black, Kevin and Conley, Ken and Connors, Grace and Darpinian, James and Dhabalia, Karan and DiCarlo, Jared and others , journal=. _
-
[17]
Black, Kevin and Brown, Noah and Darpinian, James and Dhabalia, Karan and Driess, Danny and Esmail, Adnan and Equi, Michael Robert and Finn, Chelsea and Fusai, Niccolo and Galliker, Manuel Y and others , booktitle=. _
-
[18]
arXiv preprint arXiv:2505.12224 , year=
RoboFAC: A Comprehensive Framework for Robotic Failure Analysis and Correction , author=. arXiv preprint arXiv:2505.12224 , year=
-
[19]
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Fast: Efficient Action Tokenization for Vision-Language-Action Models , author=. arXiv preprint arXiv:2501.09747 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[20]
Songming Liu and Lingxuan Wu and Bangguo Li and Hengkai Tan and Huayu Chen and Zhengyi Wang and Ke Xu and Hang Su and Jun Zhu , booktitle=
-
[21]
The International Journal of Robotics Research , year=
Diffusion Policy: Visuomotor Policy Learning via Action Diffusion , author=. The International Journal of Robotics Research , year=
-
[22]
arXiv preprint arXiv:2510.01642 , year=
FailSafe: Reasoning and Recovery from Failures in Vision-Language-Action Models , author=. arXiv preprint arXiv:2510.01642 , year=
-
[23]
arXiv preprint arXiv:2410.00371 , year=
AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation , author=. arXiv preprint arXiv:2410.00371 , year=
-
[24]
IEEE International Conference on Robotics and Automation , year=
Open X-Embodiment: Robotic Learning Datasets and RT-X Models: Open X-Embodiment Collaboration ^0 , author=. IEEE International Conference on Robotics and Automation , year=
-
[25]
IEEE Robotics and Automation Letters , year=
Rlbench: The robot learning benchmark & learning environment , author=. IEEE Robotics and Automation Letters , year=
-
[26]
Advances in Neural Information Processing Systems , year =
Denoising Diffusion Probabilistic Models , author =. Advances in Neural Information Processing Systems , year =
-
[27]
The Eleventh International Conference on Learning Representations , year=
Flow Matching for Generative Modeling , author=. The Eleventh International Conference on Learning Representations , year=
-
[28]
Annual Conference on Robot Learning , year=
What Matters in Learning from Offline Human Demonstrations for Robot Manipulation , author=. Annual Conference on Robot Learning , year=
-
[29]
IEEE International Conference on Robotics and Automation , year=
Iris: Implicit reinforcement without interaction at scale for learning control from offline robot manipulation data , author=. IEEE International Conference on Robotics and Automation , year=
-
[30]
Thirty-Sixth Conference on Neural Information Processing Systems , year=
Behavior Transformers: Cloning k modes with one stone , author=. Thirty-Sixth Conference on Neural Information Processing Systems , year=
- [31]
-
[32]
NeurIPS Workshop on Deep Generative Models and Downstream Applications , year=
Classifier-Free Diffusion Guidance , author=. NeurIPS Workshop on Deep Generative Models and Downstream Applications , year=
-
[33]
IEEE/CVF Conference on Computer Vision and Pattern Recognition , year =
Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\"orn , title =. IEEE/CVF Conference on Computer Vision and Pattern Recognition , year =
-
[34]
Hyungjin Chung and Jeongsol Kim and Geon Yeong Park and Hyelin Nam and Jong Chul Ye , booktitle=
-
[35]
IEEE/CVF International Conference on Computer Vision , year =
Fu, Xiaomeng and Li, Jia , title =. IEEE/CVF International Conference on Computer Vision , year =
-
[36]
IEEE/CVF International Conference on Computer Vision , year =
Gandikota, Rohit and Materzynska, Joanna and Fiotto-Kaufman, Jaden and Bau, David , title =. IEEE/CVF International Conference on Computer Vision , year =
-
[37]
IEEE/CVF Conference on Computer Vision and Pattern Recognition , year =
Schramowski, Patrick and Brack, Manuel and Deiseroth, Björn and Kersting, Kristian , title =. IEEE/CVF Conference on Computer Vision and Pattern Recognition , year =
-
[38]
International Conference on Learning Representations , year=
Dynamic Negative Guidance of Diffusion Models , author=. International Conference on Learning Representations , year=
-
[39]
Deep reinforcement learning for robotics: a survey of real-world successes , year =
Tang, Chen and Abbatematteo, Ben and Hu, Jiaheng and Chandra, Rohan and Mart\'. Deep reinforcement learning for robotics: a survey of real-world successes , year =. AAAI Conference on Artificial Intelligence , series =
-
[40]
IEEE International Conference on Robotics and Automation , year=
Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation , author=. IEEE International Conference on Robotics and Automation , year=
-
[41]
IEEE Robotics and Automation Letters , year=
Self-Supervised Correspondence in Visuomotor Policy Learning , author=. IEEE Robotics and Automation Letters , year=
-
[42]
International Conference on Learning Representations , year=
Imitating Human Behaviour with Diffusion Models , author=. International Conference on Learning Representations , year=
-
[43]
International Conference on Artificial Intelligence and Statistics , year =
A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , author =. International Conference on Artificial Intelligence and Statistics , year =
-
[44]
IEEE/RSJ International Conference on Intelligent Robots and Systems , year=
SpeedFolding: Learning Efficient Bimanual Folding of Garments , author=. IEEE/RSJ International Conference on Intelligent Robots and Systems , year=
-
[45]
Annual Conference on Robot Learning , year=
Implicit Behavioral Cloning , author=. Annual Conference on Robot Learning , year=
-
[46]
Predicting structured data , year=
A Tutorial on Energy-Based Learning , author=. Predicting structured data , year=
-
[47]
International Conference on Machine Learning , year =
Learning the Stein Discrepancy for Training and Evaluating Energy-Based Models without Sampling , author =. International Conference on Machine Learning , year =
-
[48]
Neural Information Processing Systems , year=
Exponential Family Estimation via Adversarial Dynamics Embedding , author=. Neural Information Processing Systems , year=
-
[49]
Robotics: Science and Systems , year=
Goal Conditioned Imitation Learning Using Score-Based Diffusion Policies , author=. Robotics: Science and Systems , year=
-
[50]
Driess, Danny and Xia, Fei and Sajjadi, Mehdi S. M. and Lynch, Corey and Chowdhery, Aakanksha and Ichter, Brian and Wahid, Ayzaan and Tompson, Jonathan and Vuong, Quan and Yu, Tianhe and Huang, Wenlong and Chebotar, Yevgen and Sermanet, Pierre and Duckworth, Daniel and Levine, Sergey and Vanhoucke, Vincent and Hausman, Karol and Toussaint, Marc and Greff,...
work page 2023
-
[51]
Conference on Robot Learning , year =
LaVA-Man: Learning Visual Action Representations for Robot Manipulation , author =. Conference on Robot Learning , year =
-
[52]
Stone Tao and Fanbo Xiang and Arth Shukla and Yuzhe Qin and Xander Hinrichsen and Xiaodi Yuan and Chen Bao and Xinsong Lin and Yulin Liu and Tse-Kai Chan and Yuan Gao and Xuanlin Li and Tongzhou Mu and Nan Xiao and Arnav Gurha and Viswesh N and Yong Woo Choi and Yen-Ru Chen and Zhiao Huang and Roberto Calandra and Rui Chen and Shan Luo and Hao Su , bookti...
work page 2025
-
[53]
International Conference on Learning Representations , year=
ManiSkill2: A Unified Benchmark for Generalizable Manipulation Skills , author=. International Conference on Learning Representations , year=
-
[54]
International Conference on Learning Representations , year=
LeRobot: An Open-Source Library for End-to-End Robot Learning , author=. International Conference on Learning Representations , year=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.