pith. sign in

arxiv: 2605.18674 · v1 · pith:EROO3PS3new · submitted 2026-05-18 · 💻 cs.AI

Efficient Lookahead Encoding and Abstracted Width for Learning General Policies in Classical Planning

Pith reviewed 2026-05-20 10:12 UTC · model grok-4.3

classification 💻 cs.AI
keywords generalized planningiterated widthrelational GNNclassical planningpolicy learninglookahead searchnovelty searchIPC benchmark
0
0 comments X

The pith

Holistic encoding of IW search trees and type-based atom abstraction enable scalable learning of general policies that outperform LAMA.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Generalized planning aims to learn policies that transfer across instances of a planning domain instead of solving each problem separately. Previous Iterated Width policies improved generalization by using lookahead search but suffered from evaluating each transition separately and from poor scaling when thousands of objects are present. This work replaces individual evaluations with a holistic encoding of the full search tree, expressed only through relational differences from the current state, so that a Relational GNN can score every possible next step in one pass. It also introduces Abstracted IW(1), which checks novelty on atoms where all but one argument is replaced by its type, shifting the computational cost from the number of atoms to the number of objects. Together these changes produce policies that set a new performance standard on the large IPC 2023 instances and on diverse domains.

Core claim

The paper presents two main advances for learning general policies using Iterated Width lookahead in classical planning. First, it introduces a holistic encoding that represents the entire IW(1) search tree by relational differences to the current state, allowing a single forward pass of a Relational GNN to evaluate all possible transitions. Second, it defines Abstracted IW(1), where novelty is checked on abstracted atoms that retain only one argument and replace the rest with types; an atom is novel if any of its abstractions is novel. This combination addresses computational unscalability and inefficiency with thousands of objects, leading to policies that generalize better and outperform

What carries the argument

Holistic encoding of the IW(1) search tree via relational differences to the current state, scored in one forward pass by a Relational GNN, together with Abstracted IW(1) novelty checks that abstract atoms by replacing all but one argument with their type.

If this is right

  • General policies can now be learned for domains with thousands of objects as in the IPC 2023 benchmark.
  • All successor transitions can be scored simultaneously rather than one at a time.
  • The resulting policies achieve higher coverage than previous GNN approaches and the classical planner LAMA.
  • The method works in domains that require features outside the C2 logic fragment.
  • Scaling shifts from depending on the number of atoms to depending on the number of objects.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar holistic encodings could be applied to other lookahead-based search methods to reduce per-transition costs.
  • The type-based abstraction might be generalized further by choosing which arguments to retain based on domain structure.
  • These policies could serve as warm starts for classical planners on very large instances.

Load-bearing premise

The type-based relational abstraction of atoms during novelty checks continues to identify the subgoal structure required for learning effective policies.

What would settle it

A planning domain where policies learned with Abstracted IW(1) solve fewer problems than policies learned with standard IW(1) on identical training data, even after accounting for the efficiency improvement.

Figures

Figures reproduced from arXiv: 2605.18674 by Hector Geffner, Martin Funkquist, Michael Aichm\"uller, Simon St{\aa}hlberg.

Figure 1
Figure 1. Figure 1: Log-log plot of the branching factor for each IPC benchmark domain as a function of the [PITH_FULL_IMAGE:figures/full_fig_p017_1.png] view at source ↗
read the original abstract

Generalized planning aims to learn policies that generalize across collections of instances within a classical planning domain. Recent Graph Neural Network (GNN) approaches have learned nearly perfect policies for several domains. This work improves on the recently published idea of Iterated Width (IW) policies. Therein, the policy broadens its successor scope through an IW-lookahead search that can "jump" over multiple transitions, simplifying the problem structure. Yet, each transition is evaluated individually, leading to unscalable compute costs and expressivity limitations. Furthermore, although IW(1) is attractive because it scales linearly with the number of atoms, it becomes inefficient once thousands of objects are considered, as in the International Planning Competition (IPC) 2023 benchmark. We address both limitations. First, we introduce a vastly more efficient holistic encoding of the entire search tree. It jointly represents IW(1)-reachable states only by their relational differences to the current state, enabling Relational GNNs (R-GNNs) to score all transitions in a single forward pass. Second, we define Abstracted IW(1) to improve scaling through relational abstraction during novelty checks. Rather than testing fully instantiated atoms, it abstracts each atom by replacing all but one argument with its type. The original atom is novel if any of its abstracted forms is novel. This structural compression shifts novelty search scaling from atoms to objects, while preserving meaningful subgoal structure. We evaluate our contributions on the hyperscaling IPC 2023 benchmark and across diverse domains, including domains requiring features beyond the $C_2$ logic fragment. Our policies achieve new state-of-the-art performance, significantly surpassing prior work, including the classical planner LAMA.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper proposes two improvements to Iterated Width (IW) policies for generalized planning: a holistic encoding of the IW(1) search tree that represents reachable states via relational differences to the current state (enabling single-pass R-GNN scoring of all transitions), and Abstracted IW(1), which abstracts each atom by replacing all but one argument with its type and declares the atom novel if any projection is novel. This shifts novelty scaling from atoms to objects. The authors evaluate on the IPC 2023 hyperscaling benchmark and diverse domains (including those beyond C2 logic) and claim new state-of-the-art policy performance that significantly surpasses prior work, including the classical planner LAMA.

Significance. If the empirical gains hold under rigorous controls, the efficient lookahead encoding would be a useful technical advance for integrating width-based search with relational neural networks, while Abstracted IW(1) could extend IW policies to larger object counts. The work directly targets scalability bottlenecks in current GNN-based generalized planners and provides a concrete path toward policies that remain effective on IPC-scale instances.

major comments (1)
  1. Abstract (and the description of Abstracted IW(1)): the claim that the abstraction 'preserves meaningful subgoal structure' lacks either a formal argument showing that the per-object projection maintains the subgoal distinctions required by IW policies or an ablation isolating the abstraction's effect on coverage and policy quality. Because the abstraction equates distinct ground atoms that differ only in typed arguments (e.g., on(p1,l1) and on(p2,l2) both project to on(X,location)), it risks either over-accepting states or missing true progress in domains that require specific multi-object conjunctions; this directly affects whether the reported SOTA gains on IPC 2023 instances can be attributed to the new components rather than to relaxed novelty detection.
minor comments (2)
  1. The abstract asserts new state-of-the-art performance but supplies no details on experimental controls, statistical significance testing, exact benchmark instance selection, or hyperparameter tuning procedures; these must be added to the experimental section with tables reporting coverage, plan quality, and runtime statistics against LAMA and prior IW baselines.
  2. Notation for the relational difference encoding and the precise definition of 'novel if any of its abstracted forms is novel' should be formalized with a short pseudocode or equation block to avoid ambiguity when readers re-implement the method.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comment on Abstracted IW(1) below and outline revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [—] Abstract (and the description of Abstracted IW(1)): the claim that the abstraction 'preserves meaningful subgoal structure' lacks either a formal argument showing that the per-object projection maintains the subgoal distinctions required by IW policies or an ablation isolating the abstraction's effect on coverage and policy quality. Because the abstraction equates distinct ground atoms that differ only in typed arguments (e.g., on(p1,l1) and on(p2,l2) both project to on(X,location)), it risks either over-accepting states or missing true progress in domains that require specific multi-object conjunctions; this directly affects whether the reported SOTA gains on IPC 2023 instances can be attributed to the new components rather than to relaxed novelty detection.

    Authors: We agree that the manuscript would benefit from a clearer justification and empirical isolation of Abstracted IW(1). While we do not provide a formal proof that the per-object type projection preserves every subgoal distinction required by IW, the abstraction is motivated by the fact that IW(1) novelty detection fundamentally tracks the first achievement of atoms; projecting all but one argument to its type preserves the relational structure per object while shifting scaling from atoms to objects. This maintains the key property that a state is novel if it introduces a previously unseen typed relation for some object. We acknowledge the theoretical risk that distinct ground atoms may be equated, potentially leading to relaxed novelty in domains with intricate multi-object conjunctions. However, our evaluation spans the IPC 2023 hyperscaling benchmark and domains beyond C2 logic, where the approach yields consistent coverage and policy improvements over prior IW policies and LAMA. To isolate the abstraction's contribution and address attribution of the SOTA results, we will add an ablation study in the revised manuscript comparing policies with standard IW(1) versus Abstracted IW(1) on coverage, plan quality, and runtime. We will also expand the discussion to note the abstraction's limitations and suggest when it may be less suitable. revision: partial

Circularity Check

0 steps flagged

No significant circularity; new encodings and abstraction are independent contributions

full rationale

The paper's derivation introduces two explicit technical novelties—an efficient holistic encoding of IW(1)-reachable states via relational differences to the current state, and the definition of Abstracted IW(1) that replaces all but one argument of each atom with its type for novelty detection—followed by empirical evaluation on the IPC 2023 hyperscaling benchmark. These steps are defined directly in the manuscript and do not reduce by construction to fitted parameters, prior outputs, or self-citations; the SOTA performance claim rests on external experimental results rather than any self-referential loop. Reliance on the general IW concept is acknowledged as background but does not carry the central claims.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on the assumption that relational GNNs can learn effective scoring from difference-based encodings and that type abstraction preserves useful novelty information; no free parameters or invented entities are described in the abstract.

axioms (2)
  • domain assumption Relational differences to the current state suffice for R-GNNs to score all IW(1)-reachable transitions in one forward pass.
    Invoked to justify the holistic encoding.
  • domain assumption Abstracting atoms by type while keeping one argument preserves meaningful subgoal structure for novelty.
    Central to the scaling claim of Abstracted IW(1).

pith-pipeline@v0.9.0 · 5851 in / 1319 out tokens · 31675 ms · 2026-05-20T10:12:42.372055+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages

  1. [1]

    Sketch decompositions for classical planning via deep reinforcement learning

    Michael Aichmüller and Hector Geffner. Sketch decompositions for classical planning via deep reinforcement learning. InProceedings of the 34th International Joint Conference on Artificial Intelligence (IJCAI 2025), pages 8438–8446,

  2. [2]

    Learning efficiency meets symmetry breaking

    Yingbin Bai, Sylvie Thiébaux, and Felipe Trevizan. Learning efficiency meets symmetry breaking. Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS 2025), 35(1):154–159,

  3. [3]

    Transfer of deep reactive policies for mdp planning

    Aniket Nick Bajpai, Sankalp Garg, et al. Transfer of deep reactive policies for mdp planning. In Proceedings of the 32nd Annual Conference on Neural Information Processing Systems (NeurIPS 2018), pages 10965–10975,

  4. [4]

    The logical expressiveness of graph neural networks

    Pablo Barceló, Egor Kostylev, Mikaël Monet, Jorge Pérez, Juan Reutter, and Juan-Pablo Silva. The logical expressiveness of graph neural networks. InProceedings of the 8th International Conference on Learning Representations (ICLR 2020),

  5. [5]

    O’Donnell, and Dzmitry Bahdanau

    Leon Bergen, Timothy J. O’Donnell, and Dzmitry Bahdanau. Systematic generalization with edge transformers. InProceedings of the 35th Annual Conference on Neural Information Processing Systems (NeurIPS 2021), pages 1390–1402,

  6. [6]

    Symbolic dynamic programming for first-order MDPs

    Craig Boutilier, Raymond Reiter, and Bob Price. Symbolic dynamic programming for first-order MDPs. InProceedings of the 17th International Joint Conference on Artificial Intelligence (IJCAI 2001), pages 690–700,

  7. [7]

    Learning to rank using gradient descent

    Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender. Learning to rank using gradient descent. InProceedings of the 22nd International Conference on Machine Learning (ICML 2005), pages 89—-96,

  8. [8]

    Chen, Felipe W

    Dillon Z. Chen, Felipe W. Trevizan, and Sylvie Thiébaux. Return to tradition: Learning reliable heuristics with classical machine learning. InProceedings of the 34th International Conference on Automated Planning and Scheduling (ICAPS 2024), pages 68–76,

  9. [9]

    Symmetries and expressive requirements for learning general policies

    Dominik Drexler, Simon Ståhlberg, Blai Bonet, and Hector Geffner. Symmetries and expressive requirements for learning general policies. InProceedings of the 21st International Conference on Principles of Knowledge Representation and Reasoning (KR 2024),

  10. [10]

    Learning general planning policies from small ex- amples without supervision

    11 Guillem Francès, Blai Bonet, and Hector Geffner. Learning general planning policies from small ex- amples without supervision. InProceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI 2021), pages 11801–11808,

  11. [11]

    Expressiveness of graph neural networks in planning domains

    Rostislav Horcík and Gustav Sír. Expressiveness of graph neural networks in planning domains. In Proceedings of the 34th International Conference on Automated Planning and Scheduling (ICAPS 2024),

  12. [12]

    State encodings for gnn-based lifted planners

    Rostislav Horcík, Gustav Sír, Vítezslav Simek, and Tomás Pevný. State encodings for gnn-based lifted planners. InProceedings of the 39th AAAI Conference on Artificial Intelligence (AAAI 2025), pages 26525–26533,

  13. [13]

    Generalized planning: Synthesizing plans that work for multiple environments

    Yuxiao Hu and Giuseppe De Giacomo. Generalized planning: Synthesizing plans that work for multiple environments. InProceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI 2011), pages 918–923,

  14. [14]

    Width and serialization of classical planning problems

    Nir Lipovetzky and Hector Geffner. Width and serialization of classical planning problems. In Proceedings of the 20th European Conference on Artificial Intelligence (ECAI 2012), pages 540–545,

  15. [15]

    Mish: A self regularized non-monotonic activation function

    Diganta Misra. Mish: A self regularized non-monotonic activation function. InProceedings of the 31st British Machine Vision Conference (BMVC 2020). BMV A Press,

  16. [16]

    Weisfeiler and leman go neural: Higher-order graph neural networks

    Christopher Morris, Martin Ritzert, Matthias Fey, William L Hamilton, Jan Eric Lenssen, Gaurav Rattan, and Martin Grohe. Weisfeiler and leman go neural: Higher-order graph neural networks. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI 2019), pages 4602–4609,

  17. [17]

    Towards principled graph trans- formers

    Luis Müller, Daniel Kusuma, Blai Bonet, and Christopher Morris. Towards principled graph trans- formers. InProceedings of the 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024),

  18. [18]

    The LAMA planner — Using landmark counting in heuris- tic search

    Silvia Richter and Matthias Westphal. The LAMA planner — Using landmark counting in heuris- tic search. IPC 2008 short papers, http://ipc.informatik.uni-freiburg.de/Planners,

  19. [19]

    Generalized planning with deep reinforcement learning

    Or Rivlin, Tamir Hazan, and Erez Karpas. Generalized planning with deep reinforcement learning. InICAPS 2020 Workshop on Bridging the Gap Between AI Planning and Reinforcement Learning (PRL), pages 16–24,

  20. [20]

    Learning general policies for planning through GPT models

    Nicholas Rossetti, Massimiliano Tummolo, Alfonso Emilio Gerevini, Luca Putelli, Ivan Serina, Mattia Chiari, and Matteo Olivato. Learning general policies for planning through GPT models. In Proceedings of the 34th International Conference on Automated Planning and Scheduling (ICAPS 2024), pages 500–508,

  21. [21]

    First-order representation languages for goal-conditioned rl

    Simon Ståhlberg and Hector Geffner. First-order representation languages for goal-conditioned rl. In Proceedings of the 40th AAAI Conference on Artificial Intelligence (AAAI 2026),

  22. [22]

    Learning general optimal policies with graph neural networks: Expressive power, transparency, and limits

    Simon Ståhlberg, Blai Bonet, and Hector Geffner. Learning general optimal policies with graph neural networks: Expressive power, transparency, and limits. InProceedings of the 32nd International Conference on Automated Planning and Scheduling (ICAPS 2022), pages 629–637, 2022a. Simon Ståhlberg, Blai Bonet, and Hector Geffner. Learning generalized policies...

  23. [23]

    Learning more expressive general policies for classical planning domains

    Simon Ståhlberg, Blai Bonet, and Hector Geffner. Learning more expressive general policies for classical planning domains. InProceedings of the 39th AAAI Conference on Artificial Intelligence (AAAI 2025), pages 26697–26706,

  24. [24]

    The 2023 International Planning Competition.AI Magazine, 45(2):280–296,

    Ayal Taitler, Ron Alford, Joan Espasa, Gregor Behnke, Daniel Fišer, Michael Gimelfarb, Florian Pommerening, Scott Sanner, Enrico Scala, Dominik Schreiber, Javier Segovia-Aguas, and Jendrik Seipp. The 2023 International Planning Competition.AI Magazine, 45(2):280–296,

  25. [25]

    How powerful are graph neural networks? InProceedings of the 7th International Conference on Learning Representations (ICLR 2019),

    Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? InProceedings of the 7th International Conference on Learning Representations (ICLR 2019),

  26. [26]

    Domains marked * are known to require logic beyond C2 expressiveness

    13 Domain AIW–AD IW–AD AIW–Ext IW–Ext Flat AA LAMA depots (86)86 (100) 86 (100)83 (97) 81 (94) 76 (88) 85 (99) delivery (80)80 (100) 80 (100) 80 (100) 80 (100)25 (31) 79 (99) driverlog (123) 120 (98)123 (100)116 (94)123 (100)122 (99)123 (100) goldminer (359)359 (100) 359 (100)133 (37) 69 (19) 132 (37) 222 (62) grippers (312)312 (100) 312 (100) 312 (100) 3...

  27. [27]

    B Base-Abstracted IW and Capacity-Limited Abstracted IW As mentioned in section 3, we also considered two modifications of AIW

    that (A)IW lookaheads simplify problem structure indeed and that this simplification is crucial for solving some domains. B Base-Abstracted IW and Capacity-Limited Abstracted IW As mentioned in section 3, we also considered two modifications of AIW. A fallback variant of AIW in Base-Abstracted IW, which is the backup abstraction whenever a domain exhibits...

  28. [28]

    into a single graph, from representing successors through state differences, or from combining both mechanisms

    Numbers in parentheses are percentages of row totals. into a single graph, from representing successors through state differences, or from combining both mechanisms. Tables 6 and 7 show that merging current and successor states into a single graph is not sufficient on its own. Internal encodings are consistently worse than External encodings, both in cove...

  29. [29]

    The actions involve picking up and putting down blocks, where blocks can only be picked up if they have no other blocks on top of them

    BlocksworldIn Blocksworld, the goal is to stack blocks on top of each other to form specific configurations. The actions involve picking up and putting down blocks, where blocks can only be picked up if they have no other blocks on top of them. ChildsnackIn this domain, the task is to serve snacks to children. Some children are gluten allergic and need to...