Recover, Discover, Plan: Learning Skills and Concepts from Robot Failures

Alexander G. Gray; Bowen Li; Jonathan Francis; Mayank Mishra; Nishanth Kumar; Ruwan Wickramarachchi; Sebastian Scherer; Stone Tao; Tom Silver; Y. Isabel Liu

arxiv: 2606.18328 · v1 · pith:BBTU4FWPnew · submitted 2026-06-16 · 💻 cs.RO

Recover, Discover, Plan: Learning Skills and Concepts from Robot Failures

Bowen Li , Mayank Mishra , Y. Isabel Liu , Stone Tao , Nishanth Kumar , Alexander G. Gray , Ruwan Wickramarachchi , Jonathan Francis

show 2 more authors

Sebastian Scherer Tom Silver

This is my paper

Pith reviewed 2026-06-27 00:11 UTC · model grok-4.3

classification 💻 cs.RO

keywords robot learningfailure recoveryrelational predicatesabstract planningreinforcement learningsim-to-real transferstate abstraction

0 comments

The pith

ReSYNC turns robot failure recoveries into relational predicates that support abstract planning for unseen long-horizon tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a method that lets a robot first use reinforcement learning to recover from failures encountered in training tasks. It then analyzes those recoveries to discover and refine relational predicates that abstract the state space. These predicates are added to a planning model so the robot can avoid similar failures in new, longer problems without retraining separate policies for each failure mode. The approach alternates between skill learning and concept learning in an incremental loop. This process converts specific recovery behaviors into general failure-avoidance strategies that transfer from simulation to real robots.

Core claim

ReSYNC performs an incremental dual-learning process in which reinforcement learning acquires recovery skills from observed failures and a subsequent concept-learning stage discovers new relational predicates that explain and generalize those recoveries. The discovered predicates are incorporated into an abstract planning model that the robot uses at test time to solve previously unseen long-horizon tasks. Across four simulated domains the method solves more problems than strong baselines and achieves over 50 percent higher success rates; the same learned abstractions also support sim-to-real transfer on non-prehensile manipulation tasks.

What carries the argument

ReSYNC's incremental dual-learning loop that alternates RL-based recovery skill acquisition with predicate discovery and refinement to expand an abstraction library for abstract planning.

If this is right

The robot solves long-horizon problems that were never seen in training by using the expanded predicate library for abstract planning.
No separate recovery policy needs to be trained for each distinct failure mode.
The same learned predicates enable generalization to unseen scenarios in both simulation and real-robot non-prehensile manipulation.
Continual expansion of the abstraction library improves performance over time without restarting from scratch.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Manual engineering of state abstractions for planning could be reduced if the discovery process scales to richer sensor data.
The same loop might be applied to domains where failures are expensive, by first learning recoveries in simulation and then transferring the resulting predicates.
Incorrect predicates could propagate planning errors unless an explicit verification step is added later.

Load-bearing premise

The discovered relational predicates will be sufficiently accurate and complete to support reliable planning on tasks outside the specific failures seen during training.

What would settle it

A new long-horizon test task whose solution requires a predicate that was never discovered or refined during the training failures, causing the abstract planner to produce an incorrect or incomplete plan.

Figures

Figures reproduced from arXiv: 2606.18328 by Alexander G. Gray, Bowen Li, Jonathan Francis, Mayank Mishra, Nishanth Kumar, Ruwan Wickramarachchi, Sebastian Scherer, Stone Tao, Tom Silver, Y. Isabel Liu.

**Figure 2.** Figure 2: Overview of the running example (Cluttered Drawer). We show the initial knowledge available to the robot, each stage of the curriculum, and example generalization tests. Training Curriculum: A user-specified curriculum presents related tasks in stages to elicit failures. Let T train i be the i th stage with Ni = 10 tasks in our experiments. We assume the robot could detect failure states (e.g., simulato… view at source ↗

**Figure 3.** Figure 3: The ReSYNC framework. Top: ReSYNC alternates between recovery skill learning and concept discovery in each stage. Bottom left: Skill learning uses successful replanning as the recovery reward. Bottom right: Dreaming generates trajectories for predicate and operator learning. object before opening the drawer in a new context. Thus, beyond skills, the robot must also learn concepts that enable reasoning over… view at source ↗

**Figure 4.** Figure 4: Visualization of the three simulated domains (excluding Cluttered Drawer) and the two [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Hardware setup and ReSYNC pipeline in the real-world experiments. During training [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Learning efficiency comparison. Thanks to relational abstractions and planning, ReSYNC reduces training episodes by ∼ 22×. Finetuning Across Scenarios: As demonstrated in [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Example VLM-OS prompt for the Blocked Stacking domain. The prompt includes the [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

**Figure 9.** Figure 9: Representative failures in our real-robot experiments. [PITH_FULL_IMAGE:figures/full_fig_p023_9.png] view at source ↗

**Figure 8.** Figure 8: Planning efficiency comparison. ReSYNC learns compact relational concepts that minimize the planning objective and yield single-plan execution, while DSG-M [5] relies solely on terminal predicates, resulting in inefficient planning. ReSYNC achieves substantially higher planning efficiency than prior work such as DSG-M [5]. As shown in Figure 8, DSG-M learns predicates that primarily characterize termi… view at source ↗

read the original abstract

Intelligent robots should not only recover from failures, but also acquire the abstract knowledge needed to avoid them in the future. While reinforcement learning (RL) can learn reactive recovery behaviors, training a separate policy for every distinct failure mode is highly inefficient. We introduce Recovery-Driven Synthesis of Relational Concepts (ReSYNC), the first approach that progressively discovers and refines state abstractions (relational predicates) from failure-recovery experience to support abstract planning. Unlike purely reactive methods, ReSYNC jointly learns skills and concepts through an incremental dual-learning process. In the skill-learning phase, the robot uses RL to learn to recover from failures seen in training tasks. In the concept-learning phase, the robot discovers new relational predicates and refines its abstract planning model to explain and generalize the learned recovery behaviors. This interaction enables ReSYNC to convert local recoveries seen during training into global failure avoidance at test time. Across four simulated domains, we show that ReSYNC's ability to continually expand and refine its abstraction library allows it to solve long-horizon, previously unseen problems, outperforming strong baselines by over 50%. Additionally, we demonstrate sim-to-real transfer of ReSYNC, where it performs real-world non-prehensile manipulation skills and generalizes to unseen scenarios through abstract planning. Overall, ReSYNC represents a significant step toward robots that autonomously acquire abstractions for scalable, failure-aware planning in the physical world.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ReSYNC runs a dual loop of RL recovery followed by predicate discovery from failures, but the abstract gives almost no mechanism details so the generalization claim is hard to assess.

read the letter

The paper's main move is to treat failure recoveries as raw material for building relational predicates that then support abstract planning. Instead of training one recovery policy per failure, it alternates skill learning via RL with a concept-learning step that adds or refines predicates to explain the recoveries and generalize them.

What the work actually shows is an incremental process that expands an abstraction library across training tasks and then uses the resulting model to solve longer unseen problems in four simulated domains, with a reported 50% edge over baselines plus a sim-to-real transfer on non-prehensile manipulation. That loop is the concrete novelty; prior work either stays reactive or assumes the predicates are already supplied.

The framing is straightforward and the sim-to-real result is a useful data point if the real-world trials are clean. The central empirical claim rests on the idea that local recoveries can be turned into global planning improvements without hand-crafted abstractions.

The soft spot is the missing account of how predicates are actually discovered, scored for accuracy, or corrected when they are incomplete. The abstract does not describe the discovery algorithm, any validation step, or what happens when a predicate fails to cover new failures. Without those pieces it is difficult to know whether the 50% gain comes from the predicates or from other factors in the RL setup. The weakest assumption is that the learned predicates will remain sufficient and sound outside the exact failure distribution seen in training.

This is aimed at robotics groups working on integrated learning and planning. A reader already thinking about abstraction discovery from experience would find the high-level architecture worth examining, even if the quantitative support needs the full methods section.

I would send it for peer review. The problem is real, the dual-loop structure is coherent on its face, and the sim-to-real attempt is worth checking, but the referee will need to see the predicate-learning details and ablations before the generalization story can be taken as settled.

Referee Report

0 major / 2 minor

Summary. The paper introduces ReSYNC, an incremental dual-learning framework in which RL is used to acquire recovery skills from observed failures and a concept-learning phase discovers and refines relational predicates that support abstract planning. The interaction between the two phases is claimed to convert local recoveries into global failure-avoidance policies that solve long-horizon, previously unseen tasks. Empirical results are reported across four simulated domains (outperforming strong baselines by more than 50 %) together with a sim-to-real demonstration of non-prehensile manipulation and generalization to unseen scenarios.

Significance. If the empirical claims are substantiated, the work would constitute a meaningful contribution to integrated learning and planning in robotics by showing how failure-recovery experience can be turned into reusable relational abstractions without hand-crafted predicates. The reported performance margin and the sim-to-real transfer are concrete strengths that would be of interest to the robotics community.

minor comments (2)

[Abstract] The abstract states that ReSYNC 'outperforms strong baselines by over 50 %' but does not name the baselines, the performance metric, or the statistical test used; the results section of the full manuscript should supply these details.
[Abstract] The description of the concept-learning phase does not indicate how predicate accuracy or sufficiency is verified before the predicates are added to the planning model; if this validation procedure is described later in the paper it should be cross-referenced from the abstract.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their careful summary of ReSYNC and for noting the potential contribution of turning failure-recovery experience into reusable relational abstractions. The recommendation is listed as uncertain, but the report contains no enumerated major comments. We therefore provide no point-by-point responses below. If the referee has additional specific concerns, we are happy to address them in a revision.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents an empirical method (ReSYNC) combining RL-based skill recovery with predicate discovery for abstract planning. No equations, derivations, or fitted parameters are described in the provided text that reduce a claimed prediction or result to its own inputs by construction. The performance claims rest on experimental outcomes across domains rather than any self-referential theoretical step. No self-citation load-bearing arguments or uniqueness theorems are invoked in the abstract or description. The derivation chain is self-contained as an algorithmic process whose validity is tested externally via simulation and real-world transfer.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No technical details are supplied in the abstract, so no free parameters, axioms, or invented entities can be identified.

pith-pipeline@v0.9.1-grok · 5807 in / 1222 out tokens · 36265 ms · 2026-06-27T00:11:52.731326+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

59 extracted references · 23 canonical work pages · 11 internal anchors

[1]

Javed and R

K. Javed and R. S. Sutton. The Big World Hypothesis and Its Ramifications for Artificial Intelligence. InFinding the Frame: An RLC Workshop for Examining Conceptual Frameworks, 2024

2024
[2]

S. Vats, M. Likhachev, and O. Kroemer. Efficient Recovery Learning using Model Predictive Meta-Reasoning. InProceedings of the International Conference on Robotics and Automation (ICRA), pages 7258–7264, 2023

2023
[3]

S. Vats, D. K. Jha, M. Likhachev, O. Kroemer, and D. Romeres. Recoverychaining: Learning Local Recovery Policies for Robust Manipulation.arXiv preprint arXiv:2410.13979, 2024

work page arXiv 2024
[4]

Bagaria, J

A. Bagaria, J. K. Senthil, and G. Konidaris. Skill Discovery for Exploration and Planning using Deep Skill Graphs. InProceedings of International Conference on Machine Learning (ICML), pages 521–531. PMLR, 2021

2021
[5]

Bagaria, A

A. Bagaria, A. D. M. Koch, R. Rodriguez-Sanchez, S. Lobel, and G. Konidaris. Intrinsically Motivated Discovery of Temporally Abstract Graph-based Models of the World. InProceedings of Reinforcement Learning Conference, 2025

2025
[6]

A. Li, N. Kumar, T. Lozano-P´erez, and L. P. Kaelbling. Learning to Bridge the Gap: Efficient Novelty Recovery with Planning and Reinforcement Learning. InNeurIPS 2024 Workshop on Open-World Agents, 2024

2024
[7]

Silver, R

T. Silver, R. Chitnis, N. Kumar, W. McClinton, T. Lozano-P ´erez, L. Kaelbling, and J. B. Tenenbaum. Predicate Invention for Bilevel Planning. InProceedings of The AAAI Conference on Artificial Intelligence (AAAI), volume 37, pages 12120–12129, 2023

2023
[8]

N. Shah, J. Nagpal, P. Verma, and S. Srivastava. From Reals to Logic and Back: Inventing Symbolic V ocabularies, Actions and Models for Planning from Raw Data.arXiv preprint arXiv:2402.11871, 2024. URLhttps://arxiv.org/pdf/2402.11871v2

work page arXiv 2024
[9]

B. Li, T. Silver, S. Scherer, and A. Gray. Bilevel Learning for Bilevel Planning. InProceedings of the Robotics: Science And Systems (RSS), 2025

2025
[10]

Silver, A

T. Silver, A. Athalye, J. B. Tenenbaum, T. Lozano-P´erez, and L. P. Kaelbling. Learning Neuro- Symbolic Skills for Bilevel Planning. InProceedings of the Conference on Robot Learning (CoRL), 2022

2022
[11]

Y . I. Liu, B. Li, B. Eysenbach, and T. Silver. SLAP: Shortcut Learning for Abstract Planning. arXiv preprint arXiv:2511.01107, 2025

work page arXiv 2025
[12]

Y . S. Shao, Y . Zheng, S. Sun, P. Chaudhari, V . Kumar, and N. Figueroa. Symskill: Symbol and Skill Co-Invention for Data-Efficient and Real-Time Long-Horizon Manipulation.arXiv preprint arXiv:2510.01661, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[13]

D. Abel, D. Arumugam, L. Lehnert, and M. Littman. State abstractions for lifelong reinforce- ment learning. InInternational Conference on Machine Learning. PMLR, 2018

2018
[14]

Eysenbach, A

B. Eysenbach, A. Gupta, J. Ibarz, and S. Levine. Diversity is all you need: Learning skills without a reward function. InInternational Conference on Learning Representations, 2019

2019
[15]

R. K. Nayyar and S. Srivastava. Autonomous Option Invention for Continual Hierarchical Reinforcement Learning and Planning. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), volume 39, pages 19642–19650, 2025

2025
[16]

Shukla, S

A. Shukla, S. Tao, and H. Su. Maniskill-hab: A benchmark for low-level manipulation in home rearrangement tasks. InProceedings of the International Conference on Learning Representations (ICLR), 2025. 10

2025
[17]

P. W. Battaglia, J. B. Hamrick, V . Bapst, A. Sanchez-Gonzalez, V . Zambaldi, M. Malinowski, A. Tacchetti, D. Raposo, A. Santoro, R. Faulkner, et al. Relational Inductive Biases, Deep Learning, and Graph Networks.arXiv preprint arXiv:1806.01261, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[18]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. Attention is All You Need. InProceedings of the Advances in Neural Information Processing Systems (NeurIPS), volume 30, 2017

2017
[19]

GPT-5.4 Thinking System Card, Mar

OpenAI. GPT-5.4 Thinking System Card, Mar. 2026. URL https://openai.com/index/ gpt-5-4-thinking-system-card/

2026
[20]

L. P. Kaelbling and T. Lozano-P ´erez. Hierarchical Task and Motion Planning in the Now. InProceedings of the International Conference on Robotics and Automation (ICRA), pages 1470–1477. IEEE, 2011

2011
[21]

C. R. Garrett, T. Lozano-P´erez, and L. P. Kaelbling. Pddlstream: Integrating Symbolic Planners and Blackbox Samplers via Optimistic Adaptive Planning. InProceedings of the International Conference on Automated Planning and Scheduling (ICAPS), volume 30, pages 440–448, 2020

2020
[22]

Chitnis, T

R. Chitnis, T. Silver, J. B. Tenenbaum, T. Lozano-P´erez, and L. P. Kaelbling. Learning Neuro- Symbolic Relational Transition Models for Bilevel Planning. InProceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4166–4173, 2022

2022
[23]

Kumar, T

N. Kumar, T. Silver, W. McClinton, L. Zhao, S. Proulx, T. Lozano-P´erez, L. P. Kaelbling, and J. Barry. Practice Makes Perfect: Planning To Learn Skill Parameter Policies. InProceedings of the Robotics: Science And Systems (RSS), 2024

2024
[24]

Konidaris, L

G. Konidaris, L. P. Kaelbling, and T. Lozano-P´erez. From skills to symbols: Learning symbolic representations for abstract high-level planning.Journal of Artificial Intelligence Research (JAIR), 2018

2018
[25]

Q. Wang, B. Li, Z. Luo, Y . Xu, A. Gray, T. Silver, S. Scherer, K. Sycara, and Y . Xie. Unifying Deep Predicate Invention with Pre-trained Foundation Models.arXiv preprint arXiv:2512.17992, 2025

work page arXiv 2025
[26]

R. S. Sutton, D. Precup, and S. Singh. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning.Artificial intelligence, 1999

1999
[27]

Bacon, J

P.-L. Bacon, J. Harb, and D. Precup. The option-critic architecture. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), volume 31, 2017

2017
[28]

Zhang and S

S. Zhang and S. Whiteson. DAC: The Double Actor-Critic Architecture for Learning Options. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), volume 32, 2019

2019
[29]

S. Moon, J. Yeom, B. Park, and H. O. Song. Discovering Hierarchical Achievements in Reinforcement Learning via Contrastive Learning. InProceedings of the Advances in Neural Information Processing Systems (NeurIPS), volume 36, pages 63674–63686, 2023

2023
[30]

Eysenbach, T

B. Eysenbach, T. Zhang, S. Levine, and R. R. Salakhutdinov. Contrastive Learning as Goal- Conditioned Reinforcement Learning. InProceedings of the Advances in Neural Information Processing Systems (NeurIPS), volume 35, pages 35603–35620, 2022

2022
[31]

Eysenbach, R

B. Eysenbach, R. Salakhutdinov, and S. Levine. Search on the Replay Buffer: Bridging Planning and Reinforcement Learning. InProceedings of the Advances in Neural Information Processing Systems (NeurIPS), volume 32, 2019

2019
[32]

Semi-parametric Topological Memory for Navigation

N. Savinov, A. Dosovitskiy, and V . Koltun. Semi-Parametric Topological Memory for Naviga- tion.arXiv preprint arXiv:1803.00653, 2018. 11

work page internal anchor Pith review Pith/arXiv arXiv 2018
[33]

F. Yang, D. Lyu, B. Liu, and S. Gustafson. PEORL: Integrating Symbolic Planning and Hierarchical Reinforcement Learning for Robust Decision-Making. InProceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pages 4860–4866, 2018

2018
[34]

Integrating Task-Motion Planning with Reinforcement Learning for Robust Decision Making in Mobile Robots

Y . Jiang, F. Yang, S. Zhang, and P. Stone. Integrating Task-Motion Planning with Reinforcement Learning for Robust Decision Making in Mobile Robots.arXiv preprint arXiv:1811.08955, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[35]

Sarathy, D

V . Sarathy, D. Kasenberg, S. Goel, J. Sinapov, and M. Scheutz. Spotter: Extending Symbolic Planning Operators through Targeted Reinforcement Learning.arXiv preprint arXiv:2012.13037, 2020

work page arXiv 2012
[36]

S. Goel, Y . Shukla, V . Sarathy, M. Scheutz, and J. Sinapov. Rapid-learn: A Framework for Learning to Recover for Handling Novelties in Open-world Environments. In2022 IEEE International Conference on Development and Learning (ICDL), pages 15–22. IEEE, 2022

2022
[37]

Aeronautiques, A

C. Aeronautiques, A. Howe, C. Knoblock, I. D. McDermott, A. Ram, M. Veloso, D. Weld, D. W. Sri, A. Barrett, D. Christianson, et al. Pddl—the planning domain definition language. Technical Report, Tech. Rep., 1998

1998
[38]

M. Helmert. The Fast Downward Planning System.Journal of Artificial Intelligence Research, 26:191–246, 2006

2006
[39]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal Policy Optimization Algorithms.arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[40]

KinDER: A Physical Reasoning Benchmark for Robot Learning and Planning

Y . Huang, B. Li, V . Saxena, Y . Liang, U. A. Mishra, L. Ji, L. Zha, J. Wu, N. Kumar, S. Scherer, et al. KinDER: A Physical Reasoning Benchmark for Robot Learning and Planning.arXiv preprint arXiv:2604.25788, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[41]

P. Yin, T. Westenbroek, Z. Zhang, J. Tran, I. Dagnino, E. Shilamkar, N. Mbiziwo-Tiapo, S. Bagaria, X. Liu, G. Mullins, A. Kolobov, and A. Gupta. Emergent Dexterity via Diverse Resets and Large-Scale Reinforcement Learning. InThe Fourteenth International Conference on Learning Representations, 2026

2026
[42]

Bjelonic, F

F. Bjelonic, F. Tischhauser, and M. Hutter. Towards Bridging the Gap: Systematic Sim-to-Real Transfer for Diverse Legged Robots.arXiv preprint arXiv:2509.06342, 2025

work page arXiv 2025
[43]

Solving Rubik's Cube with a Robot Hand

I. Akkaya, M. Andrychowicz, M. Chociej, M. Litwin, B. McGrew, A. Petron, A. Paino, M. Plappert, G. Powell, R. Ribas, et al. Solving Rubik’s Cube with a Robot Hand.arXiv preprint arXiv:1910.07113, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1910
[44]

C. Chi, S. Feng, Y . Du, Z. Xu, E. Cousineau, B. Burchfiel, and S. Song. Diffusion Policy: Visuomotor Policy Learning via Action Diffusion. InRobotics: Science and Systems (RSS), 2023

2023
[45]

SAM 3: Segment Anything with Concepts

N. Carion, L. Gustafson, Y .-T. Hu, S. Debnath, R. Hu, D. Suris, C. Ryali, K. V . Alwala, H. Khedr, A. Huang, et al. SAM 3: Segment Anything with Concepts.arXiv preprint arXiv:2511.16719, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[46]

P. Liu, L. P. Kaelbling, J. B. Tenenbaum, and J. Mao. Lifelong Experience Abstraction and Planning. InICML 2025 Workshop on Programmatic Representations for Agent Learning, 2025

2025
[47]

M. Fu, J. Yu, K. El-Refai, E. Kou, H. Xue, H. Huang, W. Xiao, G. Wang, F.-F. Li, G. Shi, et al. CaP-X: A Framework for Benchmarking and Improving Coding Agents for Robot Manipulation. arXiv preprint arXiv:2603.22435, 2026. 12

work page internal anchor Pith review arXiv 2026
[48]

Zabounidis, Y

R. Zabounidis, Y . Wu, S. Stepputtis, W. Kim, Y . Li, T. Mitchell, and K. Sycara. SCALAR: Learn- ing and Composing Skills through LLM Guided Symbolic Planning and Deep RL Grounding. arXiv preprint arXiv:2603.09036, 2026

work page arXiv 2026
[49]

J. Luo, T. Ding, K. H. R. Chan, H. Min, C. Callison-Burch, and R. Vidal. Concept Lancet: Image Editing with Compositional Representation Transplant. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 28502–28512, 2025

2025
[50]

J. Luo, J. Yang, T. Neiman, L. Fan, B. Yin, S. Tran, M. Shah, and R. Vidal. Dictionary-Aligned Concept Control for Safeguarding Multimodal LLMs.arXiv preprint arXiv:2604.08846, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[51]

Athalye, N

A. Athalye, N. Kumar, T. Silver, Y . Liang, T. Lozano-P´erez, and L. P. Kaelbling. Predicate Invention from Pixels via Pretrained Vision-Language Models.arXiv preprint arXiv:2501.00296, 2024

work page arXiv 2024
[52]

Kumar, W

N. Kumar, W. Shen, F. Ramos, D. Fox, T. Lozano-P´erez, L. P. Kaelbling, and C. R. Garrett. Open-World Task and Motion Planning via Vision-Language Model Generated Constraints. IEEE Robotics and Automation Letters (RA-L), 11:3366–3373, 2026. URL https://arxiv. org/abs/2411.08253

work page arXiv 2026
[53]

J. Duan, W. Pumacay, N. Kumar, Y . R. Wang, S. Tian, W. Yuan, R. Krishna, D. Fox, A. Man- dlekar, and Y . Guo. AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation. InProceedings of the International Conference on Learning Representations (ICLR), 2025. URLhttps://arxiv.org/abs/2410.00371

work page arXiv 2025
[54]

S. Wang, M. Han, Z. Jiao, Z. Zhang, Y . N. Wu, S.-C. Zhu, and H. Liu. Llm 3:large lan- guage model-based task and motion planning with motion failure reasoning.arXiv preprint arXiv:2403.11552, 2024. URLhttps://arxiv.org/pdf/2403.11552

work page arXiv 2024
[55]

C. Xu, T. K. Nguyen, E. Dixon, C. Rodriguez, P. Miller, R. Lee, P. Shah, R. Ambrus, H. Nishimura, and M. Itkina. Can we detect failures without failure data? uncertainty-aware runtime failure detection for imitation learning policies. InProceedings of the Robotics: Science And Systems (RSS), 2025

2025
[56]

Z. Yang, B. Hedegaard, A. Jaafar, Y . Wei, S. Thompson, S. S. Raman, H. Fu, S. Tellex, G. Konidaris, D. Paulius, et al. SkillWrapper: Generative Predicate Invention for Skill Abstrac- tion.arXiv preprint arXiv:2511.18203, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[57]

Z. Yang, J. Mao, Y . Du, J. Wu, J. B. Tenenbaum, T. Lozano-P ´erez, and L. P. Kaelbling. Compositional Diffusion-Based Continuous Constraint Solvers. InConference on Robot Learning (CoRL), 2023. URLhttps://arxiv.org/pdf/2309.00966.pdf. 13 Table 4: Important notations used in this work. Symbol Description x∈ XContinuous world state; state space xo Feature ...

work page arXiv 2023
[58]

One block is initially placed near a box corner, causing the nominal insertion behavior to fail

Cornered Insertion-S.A UR7e arm with a Robotiq 2F-140 gripper must insert one block onto another to form a tower. One block is initially placed near a box corner, causing the nominal insertion behavior to fail. Train and test tasks differ in goal specification. • State and actions:States are SE(3) object poses, estimated from wrist-camera scans using SAM3...
[59]

overfitting

Cornered Insertion-L.This domain uses the same hardware setup but adds a lid and a drawer, requiring the robot to place the completed tower into the drawer. Although it shares the recovery skill with Cornered Insertion-S, its task structure leads to different discovered concepts and planning operators. •State and actions:Same as Cornered Insertion-S. •Obj...

[1] [1]

Javed and R

K. Javed and R. S. Sutton. The Big World Hypothesis and Its Ramifications for Artificial Intelligence. InFinding the Frame: An RLC Workshop for Examining Conceptual Frameworks, 2024

2024

[2] [2]

S. Vats, M. Likhachev, and O. Kroemer. Efficient Recovery Learning using Model Predictive Meta-Reasoning. InProceedings of the International Conference on Robotics and Automation (ICRA), pages 7258–7264, 2023

2023

[3] [3]

S. Vats, D. K. Jha, M. Likhachev, O. Kroemer, and D. Romeres. Recoverychaining: Learning Local Recovery Policies for Robust Manipulation.arXiv preprint arXiv:2410.13979, 2024

work page arXiv 2024

[4] [4]

Bagaria, J

A. Bagaria, J. K. Senthil, and G. Konidaris. Skill Discovery for Exploration and Planning using Deep Skill Graphs. InProceedings of International Conference on Machine Learning (ICML), pages 521–531. PMLR, 2021

2021

[5] [5]

Bagaria, A

A. Bagaria, A. D. M. Koch, R. Rodriguez-Sanchez, S. Lobel, and G. Konidaris. Intrinsically Motivated Discovery of Temporally Abstract Graph-based Models of the World. InProceedings of Reinforcement Learning Conference, 2025

2025

[6] [6]

A. Li, N. Kumar, T. Lozano-P´erez, and L. P. Kaelbling. Learning to Bridge the Gap: Efficient Novelty Recovery with Planning and Reinforcement Learning. InNeurIPS 2024 Workshop on Open-World Agents, 2024

2024

[7] [7]

Silver, R

T. Silver, R. Chitnis, N. Kumar, W. McClinton, T. Lozano-P ´erez, L. Kaelbling, and J. B. Tenenbaum. Predicate Invention for Bilevel Planning. InProceedings of The AAAI Conference on Artificial Intelligence (AAAI), volume 37, pages 12120–12129, 2023

2023

[8] [8]

N. Shah, J. Nagpal, P. Verma, and S. Srivastava. From Reals to Logic and Back: Inventing Symbolic V ocabularies, Actions and Models for Planning from Raw Data.arXiv preprint arXiv:2402.11871, 2024. URLhttps://arxiv.org/pdf/2402.11871v2

work page arXiv 2024

[9] [9]

B. Li, T. Silver, S. Scherer, and A. Gray. Bilevel Learning for Bilevel Planning. InProceedings of the Robotics: Science And Systems (RSS), 2025

2025

[10] [10]

Silver, A

T. Silver, A. Athalye, J. B. Tenenbaum, T. Lozano-P´erez, and L. P. Kaelbling. Learning Neuro- Symbolic Skills for Bilevel Planning. InProceedings of the Conference on Robot Learning (CoRL), 2022

2022

[11] [11]

Y . I. Liu, B. Li, B. Eysenbach, and T. Silver. SLAP: Shortcut Learning for Abstract Planning. arXiv preprint arXiv:2511.01107, 2025

work page arXiv 2025

[12] [12]

Y . S. Shao, Y . Zheng, S. Sun, P. Chaudhari, V . Kumar, and N. Figueroa. Symskill: Symbol and Skill Co-Invention for Data-Efficient and Real-Time Long-Horizon Manipulation.arXiv preprint arXiv:2510.01661, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[13] [13]

D. Abel, D. Arumugam, L. Lehnert, and M. Littman. State abstractions for lifelong reinforce- ment learning. InInternational Conference on Machine Learning. PMLR, 2018

2018

[14] [14]

Eysenbach, A

B. Eysenbach, A. Gupta, J. Ibarz, and S. Levine. Diversity is all you need: Learning skills without a reward function. InInternational Conference on Learning Representations, 2019

2019

[15] [15]

R. K. Nayyar and S. Srivastava. Autonomous Option Invention for Continual Hierarchical Reinforcement Learning and Planning. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), volume 39, pages 19642–19650, 2025

2025

[16] [16]

Shukla, S

A. Shukla, S. Tao, and H. Su. Maniskill-hab: A benchmark for low-level manipulation in home rearrangement tasks. InProceedings of the International Conference on Learning Representations (ICLR), 2025. 10

2025

[17] [17]

P. W. Battaglia, J. B. Hamrick, V . Bapst, A. Sanchez-Gonzalez, V . Zambaldi, M. Malinowski, A. Tacchetti, D. Raposo, A. Santoro, R. Faulkner, et al. Relational Inductive Biases, Deep Learning, and Graph Networks.arXiv preprint arXiv:1806.01261, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[18] [18]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. Attention is All You Need. InProceedings of the Advances in Neural Information Processing Systems (NeurIPS), volume 30, 2017

2017

[19] [19]

GPT-5.4 Thinking System Card, Mar

OpenAI. GPT-5.4 Thinking System Card, Mar. 2026. URL https://openai.com/index/ gpt-5-4-thinking-system-card/

2026

[20] [20]

L. P. Kaelbling and T. Lozano-P ´erez. Hierarchical Task and Motion Planning in the Now. InProceedings of the International Conference on Robotics and Automation (ICRA), pages 1470–1477. IEEE, 2011

2011

[21] [21]

C. R. Garrett, T. Lozano-P´erez, and L. P. Kaelbling. Pddlstream: Integrating Symbolic Planners and Blackbox Samplers via Optimistic Adaptive Planning. InProceedings of the International Conference on Automated Planning and Scheduling (ICAPS), volume 30, pages 440–448, 2020

2020

[22] [22]

Chitnis, T

R. Chitnis, T. Silver, J. B. Tenenbaum, T. Lozano-P´erez, and L. P. Kaelbling. Learning Neuro- Symbolic Relational Transition Models for Bilevel Planning. InProceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4166–4173, 2022

2022

[23] [23]

Kumar, T

N. Kumar, T. Silver, W. McClinton, L. Zhao, S. Proulx, T. Lozano-P´erez, L. P. Kaelbling, and J. Barry. Practice Makes Perfect: Planning To Learn Skill Parameter Policies. InProceedings of the Robotics: Science And Systems (RSS), 2024

2024

[24] [24]

Konidaris, L

G. Konidaris, L. P. Kaelbling, and T. Lozano-P´erez. From skills to symbols: Learning symbolic representations for abstract high-level planning.Journal of Artificial Intelligence Research (JAIR), 2018

2018

[25] [25]

Q. Wang, B. Li, Z. Luo, Y . Xu, A. Gray, T. Silver, S. Scherer, K. Sycara, and Y . Xie. Unifying Deep Predicate Invention with Pre-trained Foundation Models.arXiv preprint arXiv:2512.17992, 2025

work page arXiv 2025

[26] [26]

R. S. Sutton, D. Precup, and S. Singh. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning.Artificial intelligence, 1999

1999

[27] [27]

Bacon, J

P.-L. Bacon, J. Harb, and D. Precup. The option-critic architecture. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), volume 31, 2017

2017

[28] [28]

Zhang and S

S. Zhang and S. Whiteson. DAC: The Double Actor-Critic Architecture for Learning Options. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), volume 32, 2019

2019

[29] [29]

S. Moon, J. Yeom, B. Park, and H. O. Song. Discovering Hierarchical Achievements in Reinforcement Learning via Contrastive Learning. InProceedings of the Advances in Neural Information Processing Systems (NeurIPS), volume 36, pages 63674–63686, 2023

2023

[30] [30]

Eysenbach, T

B. Eysenbach, T. Zhang, S. Levine, and R. R. Salakhutdinov. Contrastive Learning as Goal- Conditioned Reinforcement Learning. InProceedings of the Advances in Neural Information Processing Systems (NeurIPS), volume 35, pages 35603–35620, 2022

2022

[31] [31]

Eysenbach, R

B. Eysenbach, R. Salakhutdinov, and S. Levine. Search on the Replay Buffer: Bridging Planning and Reinforcement Learning. InProceedings of the Advances in Neural Information Processing Systems (NeurIPS), volume 32, 2019

2019

[32] [32]

Semi-parametric Topological Memory for Navigation

N. Savinov, A. Dosovitskiy, and V . Koltun. Semi-Parametric Topological Memory for Naviga- tion.arXiv preprint arXiv:1803.00653, 2018. 11

work page internal anchor Pith review Pith/arXiv arXiv 2018

[33] [33]

F. Yang, D. Lyu, B. Liu, and S. Gustafson. PEORL: Integrating Symbolic Planning and Hierarchical Reinforcement Learning for Robust Decision-Making. InProceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pages 4860–4866, 2018

2018

[34] [34]

Integrating Task-Motion Planning with Reinforcement Learning for Robust Decision Making in Mobile Robots

Y . Jiang, F. Yang, S. Zhang, and P. Stone. Integrating Task-Motion Planning with Reinforcement Learning for Robust Decision Making in Mobile Robots.arXiv preprint arXiv:1811.08955, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[35] [35]

Sarathy, D

V . Sarathy, D. Kasenberg, S. Goel, J. Sinapov, and M. Scheutz. Spotter: Extending Symbolic Planning Operators through Targeted Reinforcement Learning.arXiv preprint arXiv:2012.13037, 2020

work page arXiv 2012

[36] [36]

S. Goel, Y . Shukla, V . Sarathy, M. Scheutz, and J. Sinapov. Rapid-learn: A Framework for Learning to Recover for Handling Novelties in Open-world Environments. In2022 IEEE International Conference on Development and Learning (ICDL), pages 15–22. IEEE, 2022

2022

[37] [37]

Aeronautiques, A

C. Aeronautiques, A. Howe, C. Knoblock, I. D. McDermott, A. Ram, M. Veloso, D. Weld, D. W. Sri, A. Barrett, D. Christianson, et al. Pddl—the planning domain definition language. Technical Report, Tech. Rep., 1998

1998

[38] [38]

M. Helmert. The Fast Downward Planning System.Journal of Artificial Intelligence Research, 26:191–246, 2006

2006

[39] [39]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal Policy Optimization Algorithms.arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[40] [40]

KinDER: A Physical Reasoning Benchmark for Robot Learning and Planning

Y . Huang, B. Li, V . Saxena, Y . Liang, U. A. Mishra, L. Ji, L. Zha, J. Wu, N. Kumar, S. Scherer, et al. KinDER: A Physical Reasoning Benchmark for Robot Learning and Planning.arXiv preprint arXiv:2604.25788, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[41] [41]

P. Yin, T. Westenbroek, Z. Zhang, J. Tran, I. Dagnino, E. Shilamkar, N. Mbiziwo-Tiapo, S. Bagaria, X. Liu, G. Mullins, A. Kolobov, and A. Gupta. Emergent Dexterity via Diverse Resets and Large-Scale Reinforcement Learning. InThe Fourteenth International Conference on Learning Representations, 2026

2026

[42] [42]

Bjelonic, F

F. Bjelonic, F. Tischhauser, and M. Hutter. Towards Bridging the Gap: Systematic Sim-to-Real Transfer for Diverse Legged Robots.arXiv preprint arXiv:2509.06342, 2025

work page arXiv 2025

[43] [43]

Solving Rubik's Cube with a Robot Hand

I. Akkaya, M. Andrychowicz, M. Chociej, M. Litwin, B. McGrew, A. Petron, A. Paino, M. Plappert, G. Powell, R. Ribas, et al. Solving Rubik’s Cube with a Robot Hand.arXiv preprint arXiv:1910.07113, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1910

[44] [44]

C. Chi, S. Feng, Y . Du, Z. Xu, E. Cousineau, B. Burchfiel, and S. Song. Diffusion Policy: Visuomotor Policy Learning via Action Diffusion. InRobotics: Science and Systems (RSS), 2023

2023

[45] [45]

SAM 3: Segment Anything with Concepts

N. Carion, L. Gustafson, Y .-T. Hu, S. Debnath, R. Hu, D. Suris, C. Ryali, K. V . Alwala, H. Khedr, A. Huang, et al. SAM 3: Segment Anything with Concepts.arXiv preprint arXiv:2511.16719, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[46] [46]

P. Liu, L. P. Kaelbling, J. B. Tenenbaum, and J. Mao. Lifelong Experience Abstraction and Planning. InICML 2025 Workshop on Programmatic Representations for Agent Learning, 2025

2025

[47] [47]

M. Fu, J. Yu, K. El-Refai, E. Kou, H. Xue, H. Huang, W. Xiao, G. Wang, F.-F. Li, G. Shi, et al. CaP-X: A Framework for Benchmarking and Improving Coding Agents for Robot Manipulation. arXiv preprint arXiv:2603.22435, 2026. 12

work page internal anchor Pith review arXiv 2026

[48] [48]

Zabounidis, Y

R. Zabounidis, Y . Wu, S. Stepputtis, W. Kim, Y . Li, T. Mitchell, and K. Sycara. SCALAR: Learn- ing and Composing Skills through LLM Guided Symbolic Planning and Deep RL Grounding. arXiv preprint arXiv:2603.09036, 2026

work page arXiv 2026

[49] [49]

J. Luo, T. Ding, K. H. R. Chan, H. Min, C. Callison-Burch, and R. Vidal. Concept Lancet: Image Editing with Compositional Representation Transplant. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 28502–28512, 2025

2025

[50] [50]

J. Luo, J. Yang, T. Neiman, L. Fan, B. Yin, S. Tran, M. Shah, and R. Vidal. Dictionary-Aligned Concept Control for Safeguarding Multimodal LLMs.arXiv preprint arXiv:2604.08846, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[51] [51]

Athalye, N

A. Athalye, N. Kumar, T. Silver, Y . Liang, T. Lozano-P´erez, and L. P. Kaelbling. Predicate Invention from Pixels via Pretrained Vision-Language Models.arXiv preprint arXiv:2501.00296, 2024

work page arXiv 2024

[52] [52]

Kumar, W

N. Kumar, W. Shen, F. Ramos, D. Fox, T. Lozano-P´erez, L. P. Kaelbling, and C. R. Garrett. Open-World Task and Motion Planning via Vision-Language Model Generated Constraints. IEEE Robotics and Automation Letters (RA-L), 11:3366–3373, 2026. URL https://arxiv. org/abs/2411.08253

work page arXiv 2026

[53] [53]

J. Duan, W. Pumacay, N. Kumar, Y . R. Wang, S. Tian, W. Yuan, R. Krishna, D. Fox, A. Man- dlekar, and Y . Guo. AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation. InProceedings of the International Conference on Learning Representations (ICLR), 2025. URLhttps://arxiv.org/abs/2410.00371

work page arXiv 2025

[54] [54]

S. Wang, M. Han, Z. Jiao, Z. Zhang, Y . N. Wu, S.-C. Zhu, and H. Liu. Llm 3:large lan- guage model-based task and motion planning with motion failure reasoning.arXiv preprint arXiv:2403.11552, 2024. URLhttps://arxiv.org/pdf/2403.11552

work page arXiv 2024

[55] [55]

C. Xu, T. K. Nguyen, E. Dixon, C. Rodriguez, P. Miller, R. Lee, P. Shah, R. Ambrus, H. Nishimura, and M. Itkina. Can we detect failures without failure data? uncertainty-aware runtime failure detection for imitation learning policies. InProceedings of the Robotics: Science And Systems (RSS), 2025

2025

[56] [56]

Z. Yang, B. Hedegaard, A. Jaafar, Y . Wei, S. Thompson, S. S. Raman, H. Fu, S. Tellex, G. Konidaris, D. Paulius, et al. SkillWrapper: Generative Predicate Invention for Skill Abstrac- tion.arXiv preprint arXiv:2511.18203, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[57] [57]

Z. Yang, J. Mao, Y . Du, J. Wu, J. B. Tenenbaum, T. Lozano-P ´erez, and L. P. Kaelbling. Compositional Diffusion-Based Continuous Constraint Solvers. InConference on Robot Learning (CoRL), 2023. URLhttps://arxiv.org/pdf/2309.00966.pdf. 13 Table 4: Important notations used in this work. Symbol Description x∈ XContinuous world state; state space xo Feature ...

work page arXiv 2023

[58] [58]

One block is initially placed near a box corner, causing the nominal insertion behavior to fail

Cornered Insertion-S.A UR7e arm with a Robotiq 2F-140 gripper must insert one block onto another to form a tower. One block is initially placed near a box corner, causing the nominal insertion behavior to fail. Train and test tasks differ in goal specification. • State and actions:States are SE(3) object poses, estimated from wrist-camera scans using SAM3...

[59] [59]

overfitting

Cornered Insertion-L.This domain uses the same hardware setup but adds a lid and a drawer, requiring the robot to place the completed tower into the drawer. Although it shares the recovery skill with Cornered Insertion-S, its task structure leads to different discovered concepts and planning operators. •State and actions:Same as Cornered Insertion-S. •Obj...