Efficient Lookahead Encoding and Abstracted Width for Learning General Policies in Classical Planning
Pith reviewed 2026-05-20 10:12 UTC · model grok-4.3
The pith
Holistic encoding of IW search trees and type-based atom abstraction enable scalable learning of general policies that outperform LAMA.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper presents two main advances for learning general policies using Iterated Width lookahead in classical planning. First, it introduces a holistic encoding that represents the entire IW(1) search tree by relational differences to the current state, allowing a single forward pass of a Relational GNN to evaluate all possible transitions. Second, it defines Abstracted IW(1), where novelty is checked on abstracted atoms that retain only one argument and replace the rest with types; an atom is novel if any of its abstractions is novel. This combination addresses computational unscalability and inefficiency with thousands of objects, leading to policies that generalize better and outperform
What carries the argument
Holistic encoding of the IW(1) search tree via relational differences to the current state, scored in one forward pass by a Relational GNN, together with Abstracted IW(1) novelty checks that abstract atoms by replacing all but one argument with their type.
If this is right
- General policies can now be learned for domains with thousands of objects as in the IPC 2023 benchmark.
- All successor transitions can be scored simultaneously rather than one at a time.
- The resulting policies achieve higher coverage than previous GNN approaches and the classical planner LAMA.
- The method works in domains that require features outside the C2 logic fragment.
- Scaling shifts from depending on the number of atoms to depending on the number of objects.
Where Pith is reading between the lines
- Similar holistic encodings could be applied to other lookahead-based search methods to reduce per-transition costs.
- The type-based abstraction might be generalized further by choosing which arguments to retain based on domain structure.
- These policies could serve as warm starts for classical planners on very large instances.
Load-bearing premise
The type-based relational abstraction of atoms during novelty checks continues to identify the subgoal structure required for learning effective policies.
What would settle it
A planning domain where policies learned with Abstracted IW(1) solve fewer problems than policies learned with standard IW(1) on identical training data, even after accounting for the efficiency improvement.
Figures
read the original abstract
Generalized planning aims to learn policies that generalize across collections of instances within a classical planning domain. Recent Graph Neural Network (GNN) approaches have learned nearly perfect policies for several domains. This work improves on the recently published idea of Iterated Width (IW) policies. Therein, the policy broadens its successor scope through an IW-lookahead search that can "jump" over multiple transitions, simplifying the problem structure. Yet, each transition is evaluated individually, leading to unscalable compute costs and expressivity limitations. Furthermore, although IW(1) is attractive because it scales linearly with the number of atoms, it becomes inefficient once thousands of objects are considered, as in the International Planning Competition (IPC) 2023 benchmark. We address both limitations. First, we introduce a vastly more efficient holistic encoding of the entire search tree. It jointly represents IW(1)-reachable states only by their relational differences to the current state, enabling Relational GNNs (R-GNNs) to score all transitions in a single forward pass. Second, we define Abstracted IW(1) to improve scaling through relational abstraction during novelty checks. Rather than testing fully instantiated atoms, it abstracts each atom by replacing all but one argument with its type. The original atom is novel if any of its abstracted forms is novel. This structural compression shifts novelty search scaling from atoms to objects, while preserving meaningful subgoal structure. We evaluate our contributions on the hyperscaling IPC 2023 benchmark and across diverse domains, including domains requiring features beyond the $C_2$ logic fragment. Our policies achieve new state-of-the-art performance, significantly surpassing prior work, including the classical planner LAMA.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes two improvements to Iterated Width (IW) policies for generalized planning: a holistic encoding of the IW(1) search tree that represents reachable states via relational differences to the current state (enabling single-pass R-GNN scoring of all transitions), and Abstracted IW(1), which abstracts each atom by replacing all but one argument with its type and declares the atom novel if any projection is novel. This shifts novelty scaling from atoms to objects. The authors evaluate on the IPC 2023 hyperscaling benchmark and diverse domains (including those beyond C2 logic) and claim new state-of-the-art policy performance that significantly surpasses prior work, including the classical planner LAMA.
Significance. If the empirical gains hold under rigorous controls, the efficient lookahead encoding would be a useful technical advance for integrating width-based search with relational neural networks, while Abstracted IW(1) could extend IW policies to larger object counts. The work directly targets scalability bottlenecks in current GNN-based generalized planners and provides a concrete path toward policies that remain effective on IPC-scale instances.
major comments (1)
- Abstract (and the description of Abstracted IW(1)): the claim that the abstraction 'preserves meaningful subgoal structure' lacks either a formal argument showing that the per-object projection maintains the subgoal distinctions required by IW policies or an ablation isolating the abstraction's effect on coverage and policy quality. Because the abstraction equates distinct ground atoms that differ only in typed arguments (e.g., on(p1,l1) and on(p2,l2) both project to on(X,location)), it risks either over-accepting states or missing true progress in domains that require specific multi-object conjunctions; this directly affects whether the reported SOTA gains on IPC 2023 instances can be attributed to the new components rather than to relaxed novelty detection.
minor comments (2)
- The abstract asserts new state-of-the-art performance but supplies no details on experimental controls, statistical significance testing, exact benchmark instance selection, or hyperparameter tuning procedures; these must be added to the experimental section with tables reporting coverage, plan quality, and runtime statistics against LAMA and prior IW baselines.
- Notation for the relational difference encoding and the precise definition of 'novel if any of its abstracted forms is novel' should be formalized with a short pseudocode or equation block to avoid ambiguity when readers re-implement the method.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address the major comment on Abstracted IW(1) below and outline revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [—] Abstract (and the description of Abstracted IW(1)): the claim that the abstraction 'preserves meaningful subgoal structure' lacks either a formal argument showing that the per-object projection maintains the subgoal distinctions required by IW policies or an ablation isolating the abstraction's effect on coverage and policy quality. Because the abstraction equates distinct ground atoms that differ only in typed arguments (e.g., on(p1,l1) and on(p2,l2) both project to on(X,location)), it risks either over-accepting states or missing true progress in domains that require specific multi-object conjunctions; this directly affects whether the reported SOTA gains on IPC 2023 instances can be attributed to the new components rather than to relaxed novelty detection.
Authors: We agree that the manuscript would benefit from a clearer justification and empirical isolation of Abstracted IW(1). While we do not provide a formal proof that the per-object type projection preserves every subgoal distinction required by IW, the abstraction is motivated by the fact that IW(1) novelty detection fundamentally tracks the first achievement of atoms; projecting all but one argument to its type preserves the relational structure per object while shifting scaling from atoms to objects. This maintains the key property that a state is novel if it introduces a previously unseen typed relation for some object. We acknowledge the theoretical risk that distinct ground atoms may be equated, potentially leading to relaxed novelty in domains with intricate multi-object conjunctions. However, our evaluation spans the IPC 2023 hyperscaling benchmark and domains beyond C2 logic, where the approach yields consistent coverage and policy improvements over prior IW policies and LAMA. To isolate the abstraction's contribution and address attribution of the SOTA results, we will add an ablation study in the revised manuscript comparing policies with standard IW(1) versus Abstracted IW(1) on coverage, plan quality, and runtime. We will also expand the discussion to note the abstraction's limitations and suggest when it may be less suitable. revision: partial
Circularity Check
No significant circularity; new encodings and abstraction are independent contributions
full rationale
The paper's derivation introduces two explicit technical novelties—an efficient holistic encoding of IW(1)-reachable states via relational differences to the current state, and the definition of Abstracted IW(1) that replaces all but one argument of each atom with its type for novelty detection—followed by empirical evaluation on the IPC 2023 hyperscaling benchmark. These steps are defined directly in the manuscript and do not reduce by construction to fitted parameters, prior outputs, or self-citations; the SOTA performance claim rests on external experimental results rather than any self-referential loop. Reliance on the general IW concept is acknowledged as background but does not carry the central claims.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Relational differences to the current state suffice for R-GNNs to score all IW(1)-reachable transitions in one forward pass.
- domain assumption Abstracting atoms by type while keeping one argument preserves meaningful subgoal structure for novelty.
Reference graph
Works this paper leans on
-
[1]
Sketch decompositions for classical planning via deep reinforcement learning
Michael Aichmüller and Hector Geffner. Sketch decompositions for classical planning via deep reinforcement learning. InProceedings of the 34th International Joint Conference on Artificial Intelligence (IJCAI 2025), pages 8438–8446,
work page 2025
-
[2]
Learning efficiency meets symmetry breaking
Yingbin Bai, Sylvie Thiébaux, and Felipe Trevizan. Learning efficiency meets symmetry breaking. Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS 2025), 35(1):154–159,
work page 2025
-
[3]
Transfer of deep reactive policies for mdp planning
Aniket Nick Bajpai, Sankalp Garg, et al. Transfer of deep reactive policies for mdp planning. In Proceedings of the 32nd Annual Conference on Neural Information Processing Systems (NeurIPS 2018), pages 10965–10975,
work page 2018
-
[4]
The logical expressiveness of graph neural networks
Pablo Barceló, Egor Kostylev, Mikaël Monet, Jorge Pérez, Juan Reutter, and Juan-Pablo Silva. The logical expressiveness of graph neural networks. InProceedings of the 8th International Conference on Learning Representations (ICLR 2020),
work page 2020
-
[5]
O’Donnell, and Dzmitry Bahdanau
Leon Bergen, Timothy J. O’Donnell, and Dzmitry Bahdanau. Systematic generalization with edge transformers. InProceedings of the 35th Annual Conference on Neural Information Processing Systems (NeurIPS 2021), pages 1390–1402,
work page 2021
-
[6]
Symbolic dynamic programming for first-order MDPs
Craig Boutilier, Raymond Reiter, and Bob Price. Symbolic dynamic programming for first-order MDPs. InProceedings of the 17th International Joint Conference on Artificial Intelligence (IJCAI 2001), pages 690–700,
work page 2001
-
[7]
Learning to rank using gradient descent
Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender. Learning to rank using gradient descent. InProceedings of the 22nd International Conference on Machine Learning (ICML 2005), pages 89—-96,
work page 2005
-
[8]
Dillon Z. Chen, Felipe W. Trevizan, and Sylvie Thiébaux. Return to tradition: Learning reliable heuristics with classical machine learning. InProceedings of the 34th International Conference on Automated Planning and Scheduling (ICAPS 2024), pages 68–76,
work page 2024
-
[9]
Symmetries and expressive requirements for learning general policies
Dominik Drexler, Simon Ståhlberg, Blai Bonet, and Hector Geffner. Symmetries and expressive requirements for learning general policies. InProceedings of the 21st International Conference on Principles of Knowledge Representation and Reasoning (KR 2024),
work page 2024
-
[10]
Learning general planning policies from small ex- amples without supervision
11 Guillem Francès, Blai Bonet, and Hector Geffner. Learning general planning policies from small ex- amples without supervision. InProceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI 2021), pages 11801–11808,
work page 2021
-
[11]
Expressiveness of graph neural networks in planning domains
Rostislav Horcík and Gustav Sír. Expressiveness of graph neural networks in planning domains. In Proceedings of the 34th International Conference on Automated Planning and Scheduling (ICAPS 2024),
work page 2024
-
[12]
State encodings for gnn-based lifted planners
Rostislav Horcík, Gustav Sír, Vítezslav Simek, and Tomás Pevný. State encodings for gnn-based lifted planners. InProceedings of the 39th AAAI Conference on Artificial Intelligence (AAAI 2025), pages 26525–26533,
work page 2025
-
[13]
Generalized planning: Synthesizing plans that work for multiple environments
Yuxiao Hu and Giuseppe De Giacomo. Generalized planning: Synthesizing plans that work for multiple environments. InProceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI 2011), pages 918–923,
work page 2011
-
[14]
Width and serialization of classical planning problems
Nir Lipovetzky and Hector Geffner. Width and serialization of classical planning problems. In Proceedings of the 20th European Conference on Artificial Intelligence (ECAI 2012), pages 540–545,
work page 2012
-
[15]
Mish: A self regularized non-monotonic activation function
Diganta Misra. Mish: A self regularized non-monotonic activation function. InProceedings of the 31st British Machine Vision Conference (BMVC 2020). BMV A Press,
work page 2020
-
[16]
Weisfeiler and leman go neural: Higher-order graph neural networks
Christopher Morris, Martin Ritzert, Matthias Fey, William L Hamilton, Jan Eric Lenssen, Gaurav Rattan, and Martin Grohe. Weisfeiler and leman go neural: Higher-order graph neural networks. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI 2019), pages 4602–4609,
work page 2019
-
[17]
Towards principled graph trans- formers
Luis Müller, Daniel Kusuma, Blai Bonet, and Christopher Morris. Towards principled graph trans- formers. InProceedings of the 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024),
work page 2024
-
[18]
The LAMA planner — Using landmark counting in heuris- tic search
Silvia Richter and Matthias Westphal. The LAMA planner — Using landmark counting in heuris- tic search. IPC 2008 short papers, http://ipc.informatik.uni-freiburg.de/Planners,
work page 2008
-
[19]
Generalized planning with deep reinforcement learning
Or Rivlin, Tamir Hazan, and Erez Karpas. Generalized planning with deep reinforcement learning. InICAPS 2020 Workshop on Bridging the Gap Between AI Planning and Reinforcement Learning (PRL), pages 16–24,
work page 2020
-
[20]
Learning general policies for planning through GPT models
Nicholas Rossetti, Massimiliano Tummolo, Alfonso Emilio Gerevini, Luca Putelli, Ivan Serina, Mattia Chiari, and Matteo Olivato. Learning general policies for planning through GPT models. In Proceedings of the 34th International Conference on Automated Planning and Scheduling (ICAPS 2024), pages 500–508,
work page 2024
-
[21]
First-order representation languages for goal-conditioned rl
Simon Ståhlberg and Hector Geffner. First-order representation languages for goal-conditioned rl. In Proceedings of the 40th AAAI Conference on Artificial Intelligence (AAAI 2026),
work page 2026
-
[22]
Simon Ståhlberg, Blai Bonet, and Hector Geffner. Learning general optimal policies with graph neural networks: Expressive power, transparency, and limits. InProceedings of the 32nd International Conference on Automated Planning and Scheduling (ICAPS 2022), pages 629–637, 2022a. Simon Ståhlberg, Blai Bonet, and Hector Geffner. Learning generalized policies...
work page 2022
-
[23]
Learning more expressive general policies for classical planning domains
Simon Ståhlberg, Blai Bonet, and Hector Geffner. Learning more expressive general policies for classical planning domains. InProceedings of the 39th AAAI Conference on Artificial Intelligence (AAAI 2025), pages 26697–26706,
work page 2025
-
[24]
The 2023 International Planning Competition.AI Magazine, 45(2):280–296,
Ayal Taitler, Ron Alford, Joan Espasa, Gregor Behnke, Daniel Fišer, Michael Gimelfarb, Florian Pommerening, Scott Sanner, Enrico Scala, Dominik Schreiber, Javier Segovia-Aguas, and Jendrik Seipp. The 2023 International Planning Competition.AI Magazine, 45(2):280–296,
work page 2023
-
[25]
Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? InProceedings of the 7th International Conference on Learning Representations (ICLR 2019),
work page 2019
-
[26]
Domains marked * are known to require logic beyond C2 expressiveness
13 Domain AIW–AD IW–AD AIW–Ext IW–Ext Flat AA LAMA depots (86)86 (100) 86 (100)83 (97) 81 (94) 76 (88) 85 (99) delivery (80)80 (100) 80 (100) 80 (100) 80 (100)25 (31) 79 (99) driverlog (123) 120 (98)123 (100)116 (94)123 (100)122 (99)123 (100) goldminer (359)359 (100) 359 (100)133 (37) 69 (19) 132 (37) 222 (62) grippers (312)312 (100) 312 (100) 312 (100) 3...
work page 2023
-
[27]
that (A)IW lookaheads simplify problem structure indeed and that this simplification is crucial for solving some domains. B Base-Abstracted IW and Capacity-Limited Abstracted IW As mentioned in section 3, we also considered two modifications of AIW. A fallback variant of AIW in Base-Abstracted IW, which is the backup abstraction whenever a domain exhibits...
work page 2023
-
[28]
Numbers in parentheses are percentages of row totals. into a single graph, from representing successors through state differences, or from combining both mechanisms. Tables 6 and 7 show that merging current and successor states into a single graph is not sufficient on its own. Internal encodings are consistently worse than External encodings, both in cove...
work page 2023
-
[29]
BlocksworldIn Blocksworld, the goal is to stack blocks on top of each other to form specific configurations. The actions involve picking up and putting down blocks, where blocks can only be picked up if they have no other blocks on top of them. ChildsnackIn this domain, the task is to serve snacks to children. Some children are gluten allergic and need to...
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.