SkillWrapper: Generative Predicate Invention for Task-level Robot Planning

Ahmed Jaafar; Benned Hedegaard; David Paulius; George Konidaris; Haotian Fu; Naman Shah; Shreyas S. Raman; Skye Thompson; Stefanie Tellex; Yichen Wei

arxiv: 2511.18203 · v7 · pith:PI4W7UVInew · submitted 2025-11-22 · 💻 cs.RO

SkillWrapper: Generative Predicate Invention for Task-level Robot Planning

Ziyi Yang , Benned Hedegaard , Ahmed Jaafar , Yichen Wei , Skye Thompson , Shreyas S. Raman , Haotian Fu , Stefanie Tellex

show 3 more authors

George Konidaris David Paulius Naman Shah

This is my paper

Pith reviewed 2026-05-17 05:39 UTC · model grok-4.3

classification 💻 cs.RO

keywords generative predicate inventionskill abstractionsymbolic operatorsrobot task planningfoundation modelsRGB observationsblack-box skillslong-horizon tasks

0 comments

The pith

A formal theory of generative predicate invention produces symbolic operators for provably sound and complete robot task planning from RGB images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a formal theory of generative predicate invention that turns foundation-model outputs into symbolic operators supporting sound and complete planning over black-box skills. This matters because it lets agents reason at a high level while executing low-level actions without needing access to internal skill states or hand-designed abstractions. SkillWrapper puts the theory into practice by directing foundation models to collect interaction data and learn human-interpretable representations solely from RGB observations. If the approach holds, robots can solve previously unseen long-horizon tasks by composing learned operators into plans that remain valid when executed in the real world.

Core claim

The authors present a formal theory of generative predicate invention for skill abstraction, resulting in symbolic operators that can be used for provably sound and complete planning. SkillWrapper implements the theory by using foundation models to actively collect robot data and learn human-interpretable, plannable representations of black-box skills from RGB image observations alone, with empirical validation in simulation and on physical robots for long-horizon tasks.

What carries the argument

The formal theory of generative predicate invention, which defines the conditions under which generated predicates yield symbolic operators that preserve soundness and completeness for domain-independent planning.

If this is right

The resulting symbolic operators integrate directly with standard domain-independent planners for high-level task reasoning.
Representations learned in simulation or from collected data enable solving long-horizon tasks that were not encountered during training.
Planning proceeds using only RGB images even when the underlying skills remain black boxes with no exposed state.
The same learned abstractions support both simulated training and direct real-robot deployment without additional engineering.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the formal properties transfer reliably, the method could reduce reliance on manually engineered predicates across many robot domains.
Active data collection guided by the theory might be adapted to handle partial observability or sensor noise in more complex settings.
The predicate invention process could be tested for compatibility with other high-level planners or combined with learned low-level controllers.

Load-bearing premise

The predicates generated by the foundation model must satisfy the formal completeness and soundness conditions required by the theory, and these properties must transfer when the black-box skills run on real robots from image inputs.

What would settle it

A concrete counterexample in which a plan produced by the learned operators cannot reach the goal despite each individual skill executing correctly on the robot would falsify the claim that the operators are sound and complete.

Figures

Figures reproduced from arXiv: 2511.18203 by Ahmed Jaafar, Benned Hedegaard, David Paulius, George Konidaris, Haotian Fu, Naman Shah, Shreyas S. Raman, Skye Thompson, Stefanie Tellex, Yichen Wei, Ziyi Yang.

**Figure 1.** Figure 1: Overview of SkillWrapper. For an agent equipped with black-box skills, SkillWrapper learns skill representations that are compatible with off-the-shelf planners. These representations are comprised of predicates invented by the foundation model. Given a novel planning problem described using the initial state and goal state as RGB images, a foundation model produces the corresponding abstract states by a… view at source ↗

**Figure 2.** Figure 2: Example of Predicate Invention. The initial states of two transitions are both said to satisfy the preconditions of certain operators learned from the same skill, while transition 1 is successful, but transition 2 is not. In this case, the first condition (precondition) is triggered, and the foundation model is prompted with both transitions to invent a new predicate. Empirical predicate selection. Althoug… view at source ↗

**Figure 3.** Figure 3: Robotouille environment. We first conduct experiments in Robotouille (Gonzalez-Pumariega et al., 2025), which is a simulated grid world kitchen domain with an agent that has five high-level skills: Pick, Place, Cut, Cook, and Stack. In the environment, there are several objects: a patty, lettuce, a top bun, and a bottom bun; there is also a cutting board and a stove for cutting the lettuce and cooking the … view at source ↗

**Figure 4.** Figure 4: Initial and Goal States for Real Robot Experiments. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Sequence of Bimanual Robot Skill Execution with Predicate Value Changes [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Bimanual Kuka Scenario Results over 5 iterations with invented predicate and learned [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Example task in Robotouille. (a) Initial state (b) Goal state [PITH_FULL_IMAGE:figures/full_fig_p029_7.png] view at source ↗

**Figure 8.** Figure 8: Example task in Franka. (a) Initial state (b) Goal state [PITH_FULL_IMAGE:figures/full_fig_p029_8.png] view at source ↗

**Figure 9.** Figure 9: Example task in Bimanual Kuka. 29 [PITH_FULL_IMAGE:figures/full_fig_p029_9.png] view at source ↗

**Figure 17.** Figure 17: Predicate Invention Case #1 in Franka. Target predicate: GripperEmpty( Existing predicates: ∅ (a) ✓Stack(Bowl, Plate) (b) ×Stack(Bowl, Plate) GPT-5 ✓ plate top empty(?plate) ✓ plate is clean(?plate) ✓ plate is clean(?plate) Qwen3 ✗ stacked on (?pickupable, ?plate) ✗ on center of (?pickupable, ?plate) ✗ is fully supported (?pickupable, ?plate) [PITH_FULL_IMAGE:figures/full_fig_p035_17.png] view at source ↗

**Figure 18.** Figure 18: Predicate Invention Case #2 in Franka. Target predicate: PlateIsDirty(? plate) Existing predicates: GripperEmpty(), Holding(? pickupable) (a) ✓Scoop(Knife, Jar) (b) ✗Scoop(Knife, Jar) GPT-5 ✓ Open(?openable) ✓ Open(?openable) ✓ Open(?openable) Qwen3 ✗UtensilInOpenable (?utensil, ?openable) ✗UtensilInOpening (?utensil, ?openable) ✗UtensilInOpenable (?utensil, ?openable) [PITH_FULL_IMAGE:figures/full_fig_p… view at source ↗

**Figure 19.** Figure 19: Predicate Invention Case #1 in Bi-Kuka. Target predicate: LidOff(? openable) Existing predicates:InLeftGripper(? openable), InRightGripper(? utensil) (a) ✓Open(Jar) (b) ×Open(Jar) GPT-5 ✓RightHandEmpty() ✓RightHandEmpty() ✗ LidAttached(?openable) Qwen3 ✗ FullyEnclosedByLeftGripper (?openable) ✗ FullyEnclosedByLeftGripper (?openable) ✗ FullyEnclosedByLeftGripper (?openable) [PITH_FULL_IMAGE:figures/full_f… view at source ↗

**Figure 20.** Figure 20: Predicate Invention Case #2 in Bi-Kuka. Target predicate: RightGripperEmpty() Existing predicates:InLeftGripper(? openable), LidOff(? openable) 35 [PITH_FULL_IMAGE:figures/full_fig_p035_20.png] view at source ↗

read the original abstract

Generalizing from individual skill executions to long-horizon tasks is a core challenge in building autonomous robots. A promising direction is learning high-level, symbolic representations of low-level robot skills, enabling abstract reasoning independent of the low-level state space. Recent advances in foundation models have made it possible to generate symbolic predicates that operate on raw sensory inputs-a process we call generative predicate invention-to facilitate downstream representation learning. However, prior work learns these abstractions using heuristic or ad-hoc procedures, ignoring the question of which formal properties they ought to satisfy, and how to guarantee these properties. We address these questions by presenting a formal theory of generative predicate invention for task-level planning, and proposing SkillWrapper, a method that learns symbolic models for provably sound and complete planning. Our approach leverages foundation models to actively collect robot data and learn human-interpretable, plannable representations, using only RGB image observations. Our extensive empirical evaluation in simulation and on real robots shows that SkillWrapper learns abstract representations that enable robots to compose black-box skills to solve unseen, long-horizon tasks in the real world.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a formal theory for generative predicate invention plus SkillWrapper to turn foundation-model outputs into plannable operators, but the soundness guarantees rest on predicates meeting conditions that the learning step does not enforce.

read the letter

The main takeaway is that this work supplies a formal theory specifying the properties predicates must have for sound and complete planning over skill abstractions, then builds SkillWrapper to generate those predicates from RGB images via foundation models and active robot data collection. That combination is the concrete step forward. Prior predicate-learning methods often stayed heuristic; here the authors try to state the exact conditions needed for planning guarantees and tie the learning procedure to them. The empirical side shows the approach handling unseen long-horizon tasks in simulation and on real robots with black-box skills, which is the practical payoff they emphasize. The active collection loop is a sensible way to gather the right data without hand-engineering predicates. Those pieces are worth crediting. The soft spot is exactly where the stress-test note flags it. The theory is conditional on predicates that correctly classify states, preserve transition semantics, and cover the relevant space. Foundation models are approximate and stochastic, and nothing in the described method adds verification or correction steps to ensure the outputs meet those conditions. Limited trajectories, whether simulated or collected, cannot certify behavior across the full state space or under real-robot shifts. If any predicate violates the assumptions, the soundness and completeness claims no longer hold. The abstract presents results at a high level without error bars, explicit baselines, or exclusion criteria, so it is difficult to gauge how robust the support actually is. This paper is aimed at robotics researchers working on hybrid symbolic-learning pipelines for long-horizon tasks. Readers already thinking about predicate abstraction or neuro-symbolic planning will find the framework and the real-robot experiments useful to build on or critique. It shows clear engagement with the planning literature and a reproducible direction, so it deserves a serious referee to examine the formal derivations and the experimental controls in detail. I would send it to peer review.

Referee Report

2 major / 2 minor

Summary. The paper introduces a formal theory of generative predicate invention for skill abstraction, which produces symbolic operators suitable for provably sound and complete planning. SkillWrapper is proposed as a practical method that employs foundation models to actively gather robot data from RGB observations and learn interpretable, plannable representations of black-box skills. Extensive experiments in simulation and on physical robots demonstrate the approach's ability to solve previously unseen long-horizon tasks.

Significance. Should the generated predicates reliably satisfy the formal conditions and the learned representations transfer effectively to real-world execution, this contribution would be significant. It bridges data-driven foundation models with symbolic AI planning, offering a pathway to guaranteed performance in complex robotic tasks without requiring full state observability or hand-crafted abstractions.

major comments (2)

[§3] The formal theory claims to yield provably sound and complete planning from predicates that meet specific conditions (e.g., accurate state classification and preservation of transition semantics). However, the generative process in SkillWrapper, which relies on foundation models trained on limited trajectories, provides no enforcement or verification mechanism to ensure these conditions are met, particularly regarding completeness over the full state space or under real-robot distribution shifts.
[§5] The empirical evaluation summarizes results at a high level without error bars, detailed baselines, or explicit exclusion criteria for successful task executions. This limits the ability to verify whether the performance gains support the central claim of enabling reliable planning for unseen tasks with black-box skills.

minor comments (2)

[Abstract] The abstract mentions 'extensive empirical evaluation' but provides no quantitative details; consider adding key metrics or success rates to better convey the strength of the results.
[Notation] Some notation for the invented predicates and operators could be clarified earlier in the paper to aid readers unfamiliar with the formal framework.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, with revisions indicated where appropriate to improve clarity and rigor.

read point-by-point responses

Referee: [§3] The formal theory claims to yield provably sound and complete planning from predicates that meet specific conditions (e.g., accurate state classification and preservation of transition semantics). However, the generative process in SkillWrapper, which relies on foundation models trained on limited trajectories, provides no enforcement or verification mechanism to ensure these conditions are met, particularly regarding completeness over the full state space or under real-robot distribution shifts.

Authors: We appreciate the referee's emphasis on the distinction between the formal theory and its practical realization. Section 3 presents sufficient conditions on predicates that guarantee sound and complete planning when those conditions hold; the theory itself is agnostic to the method of predicate generation. SkillWrapper is a practical, data-driven procedure that uses foundation models to propose predicates from limited RGB trajectories. We do not claim a formal enforcement or verification procedure, as exhaustive verification of completeness over the full (potentially continuous) state space is intractable and would be further complicated by distribution shifts on real robots. Instead, we rely on empirical validation across simulation and physical experiments showing successful planning on unseen long-horizon tasks. In the revised manuscript we will add a new subsection in §3 that explicitly discusses the gap between the theoretical conditions and the learned predicates, including potential failure modes under distribution shift and the role of empirical evidence in supporting the claims. revision: partial
Referee: [§5] The empirical evaluation summarizes results at a high level without error bars, detailed baselines, or explicit exclusion criteria for successful task executions. This limits the ability to verify whether the performance gains support the central claim of enabling reliable planning for unseen tasks with black-box skills.

Authors: We agree that the current empirical presentation would benefit from greater detail and transparency. In the revised version we will augment all tables and figures with error bars (standard deviation across repeated trials), expand the description of baselines and ablations with explicit implementation details, and add a dedicated paragraph specifying the success criteria and any exclusion rules used for task executions. These additions will make the performance gains more verifiable and directly support the central claim. revision: yes

Circularity Check

0 steps flagged

No significant circularity; formal theory and method are independent

full rationale

The paper introduces a formal theory of generative predicate invention that yields symbolic operators for provably sound and complete planning, conditional on predicates satisfying stated properties such as accurate state classification and transition preservation. SkillWrapper then uses foundation models and active data collection from RGB observations to produce those predicates. No equations, self-referential definitions, or reductions appear that make the planning guarantees equivalent to fitted parameters or prior self-citations by construction. The derivation relies on external foundation models and robot data, keeping the central claims self-contained rather than circular. This matches the default expectation for papers without load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Only the abstract is available, so the ledger reflects high-level claims rather than explicit equations or sections; the formal theory is presumed to introduce assumptions about predicate properties that are not detailed here.

axioms (1)

domain assumption Generated predicates satisfy the formal properties needed for sound and complete planning
Invoked as the basis for the provable guarantees stated in the abstract.

invented entities (1)

Generative predicates invented by foundation models no independent evidence
purpose: To produce human-interpretable symbolic abstractions of black-box skills from RGB observations
New postulated mechanism that converts sensory data into plannable operators; no independent falsifiable handle is described in the abstract.

pith-pipeline@v0.9.0 · 5561 in / 1208 out tokens · 34918 ms · 2026-05-17T05:39:17.859665+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Learning Bilevel Policies over Symbolic World Models for Long-Horizon Planning
cs.AI 2026-05 unverdicted novelty 6.0

BISON learns bilevel policies over symbolic world models to generalize long-horizon robotic planning beyond VLA and end-to-end baselines while remaining efficient even at 10,000-object scale.