Representation Learning for Classical Planning from Partially Observed Traces

Hai Wan; Hankui Hankz Zhuo; Jinxia Lin; Yanan Liu; Zhanhao Xiao

arxiv: 1907.08352 · v1 · pith:UMYV34BHnew · submitted 2019-07-19 · 💻 cs.AI

Representation Learning for Classical Planning from Partially Observed Traces

Zhanhao Xiao , Hai Wan , Hankui Hankz Zhuo , Jinxia Lin , Yanan Liu This is my paper

Pith reviewed 2026-05-24 19:41 UTC · model grok-4.3

classification 💻 cs.AI

keywords classical planningdomain model learninggraph neural networksrepresentation learningpartially observed tracesheuristic learningmodel-based planningvectorized models

0 comments

The pith

A graph neural network learns vectorized domain models from partial traces that solve more real planning problems than declarative models from ARMS.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes LP-GNN, a framework that learns planning domain models in vectorized form from partially observed traces by embedding propositions and actions into a graph processed by a neural network. This embedding is used to uncover latent relationships that become domain-specific heuristics, combining model-free learning with model-based planning. The approach avoids the need for complete declarative specifications in languages like STRIPS or PDDL, which are sensitive to inaccuracies. A sympathetic reader would care because manually writing full domain models remains a major bottleneck for applying classical planning in real-world settings. On five classical domains the learned models prove more effective at solving actual planning instances than those produced by the ARMS learner.

Core claim

The authors claim that embedding propositions and actions in a graph within the LP-GNN framework allows the exploration of latent relationships to form domain-specific heuristics; the resulting vectorized domain models integrate model-free learning and model-based planning and are much more effective on solving real planning problems than the declarative models output by ARMS.

What carries the argument

LP-GNN, a graph neural network that embeds propositions and actions in a graph to derive domain-specific heuristics from partial traces.

If this is right

Domain models no longer need to be expressed in exact declarative languages to be usable by planners.
Planning tasks become solvable from incomplete observation traces without full model specification.
Heuristics arise automatically from the graph structure rather than from hand-crafted rules.
The same learned representation can be applied across multiple planning instances in a domain.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same graph-embedding idea could be tested on planning domains with greater partial observability to measure robustness limits.
The learned vector representations might transfer to related sequential decision tasks such as automated verification or scheduling.
Combining the LP-GNN heuristics with modern heuristic-search planners not used in the original experiments could produce further gains.

Load-bearing premise

Embedding propositions and actions in a graph lets the neural network discover relationships that produce heuristics a planner can actually use.

What would settle it

Run the learned LP-GNN models and the ARMS models on the same set of planning instances from the five test domains and count solved problems; if LP-GNN solves no more or fewer instances, the effectiveness claim is falsified.

Figures

Figures reproduced from arXiv: 1907.08352 by Hai Wan, Hankui Hankz Zhuo, Jinxia Lin, Yanan Liu, Zhanhao Xiao.

**Figure 1.** Figure 1: An Overview of LP-GNN a training set for an action selection network. The action selection network is trained to return the actions executed in the former state in every pair, which are considered as appropriate actions towards the latter state. The heuristic function is obtained via computing the distances to the appropriate actions. Then, the heuristic function learned helps to choose a suitable action t… view at source ↗

**Figure 2.** Figure 2: Comparisons on instances solved with various observation percentages. LP-GNN is our approach and LP-GNN -SVM and LP-GNN -RF are our approaches with replacing action selection MLP by SVM and Random Forest. Instances solved are the testing instances which are solved under the original domain model by the plans computed according to the learned domain model. by other states. So, we only focus on the true prop… view at source ↗

read the original abstract

Specifying a complete domain model is time-consuming, which has been a bottleneck of AI planning technique application in many real-world scenarios. Most classical domain-model learning approaches output a domain model in the form of the declarative planning language, such as STRIPS or PDDL, and solve new planning instances by invoking an existing planner. However, planning in such a representation is sensitive to the accuracy of the learned domain model which probably cannot be used to solve real planning problems. In this paper, to represent domain models in a vectorization representation way, we propose a novel framework based on graph neural network (GNN) integrating model-free learning and model-based planning, called LP-GNN. By embedding propositions and actions in a graph, the latent relationship between them is explored to form a domain-specific heuristics. We evaluate our approach on five classical planning domains, comparing with the classical domain-model learner ARMS. The experimental results show that the domain models learned by our approach are much more effective on solving real planning problems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LP-GNN shifts to GNN vector embeddings for planning domains from partial traces but supplies almost no evidence that the embeddings produce usable heuristics.

read the letter

The main takeaway is that this paper replaces the usual symbolic output of domain learners with a GNN that embeds propositions and actions from partial traces and claims the resulting vectors serve as effective domain-specific heuristics inside a planner. It reports better results than ARMS across five domains. What is new is the explicit move to a vectorized representation that tries to combine model-free learning with model-based planning instead of producing a declarative model that then gets handed to an off-the-shelf planner. The framing of the acquisition bottleneck is clear and the choice of partial traces as input is realistic. The paper also cites ARMS directly, which is the right baseline. The soft spots are more substantial. The abstract contains no numbers, no description of how the GNN output is turned into planner input, no mention of statistical tests, and no discussion of how logical constraints such as consistent preconditions are maintained. Standard GNN aggregation does not automatically enforce the state-transition semantics needed for reliable planning, so the transfer from embedding quality to actual problem-solving performance rests on an untested assumption. Without those details it is hard to know whether the reported gains are real or artifacts of the experimental setup. This work is aimed at people already exploring neural representations inside planning systems. A reader looking for a method they can apply or cite would need the full experimental section and code before treating the results as settled. The paper is coherent enough on its own terms to go to referees, though any review would have to focus on whether the integration actually works as claimed.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes LP-GNN, a graph neural network framework for learning vectorized domain models from partially observed planning traces. Propositions and actions are embedded in a graph to explore latent relationships that form domain-specific heuristics. The approach integrates model-free learning with model-based planning and is evaluated on five classical planning domains, where it is claimed to produce domain models that are much more effective for solving real planning problems than those learned by ARMS.

Significance. If the experimental claims hold, the work could advance domain model learning by providing a vectorized representation that is potentially more robust to incomplete traces than declarative models. The GNN-based integration of representation learning and planning offers a novel direction for handling real-world scenarios where full domain models are hard to specify.

major comments (2)

[Abstract] Abstract: The central claim that 'the domain models learned by our approach are much more effective on solving real planning problems' is asserted without any reported metrics, statistical tests, error analysis, or even the names of the five domains and the specific planner used, which is load-bearing for evaluating the comparison to ARMS.
[Abstract] Abstract: The core mechanism—that embedding propositions and actions in a graph lets the GNN discover latent relationships forming usable domain-specific heuristics—is stated without justification that standard GNN aggregation preserves or enforces planning semantics such as consistent preconditions/effects or valid state transitions from the traces; this assumption underpins the claim that the vectorized model can be integrated into model-based planning.

minor comments (1)

[Abstract] Abstract: The abstract refers to 'five classical planning domains' without naming them or indicating the performance metrics (e.g., success rate, plan quality) used in the comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for these constructive comments on the abstract. Both points identify areas where the abstract can be strengthened to better support the paper's claims. We will revise the abstract in the next version and provide more context below.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that 'the domain models learned by our approach are much more effective on solving real planning problems' is asserted without any reported metrics, statistical tests, error analysis, or even the names of the five domains and the specific planner used, which is load-bearing for evaluating the comparison to ARMS.

Authors: We agree that the abstract is too high-level and does not include the concrete details needed to evaluate the central claim. The body of the manuscript reports results across five domains with comparisons to ARMS using a standard planner, including success rates and plan quality metrics. In the revision we will expand the abstract to name the domains, specify the planner, and summarize the key quantitative improvements along with the evaluation protocol. revision: yes
Referee: [Abstract] Abstract: The core mechanism—that embedding propositions and actions in a graph lets the GNN discover latent relationships forming usable domain-specific heuristics—is stated without justification that standard GNN aggregation preserves or enforces planning semantics such as consistent preconditions/effects or valid state transitions from the traces; this assumption underpins the claim that the vectorized model can be integrated into model-based planning.

Authors: The abstract is constrained by length, but the manuscript justifies the mechanism in the model section by describing how the graph is constructed from the traces (with nodes and edges encoding propositions, actions, and observed transitions) and how the training objective encourages the learned embeddings to produce heuristics that respect preconditions and effects. We will add a short clarifying clause to the abstract to indicate that the GNN is trained end-to-end to produce representations compatible with model-based planning. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents LP-GNN as a GNN-based embedding of propositions and actions from traces to produce vectorized domain models for planning. No equations, self-citations, or fitted parameters are shown reducing the central claim (superior effectiveness on real problems vs ARMS) to a definition or input by construction. The approach applies standard GNN techniques without renaming known results or smuggling ansatzes; the integration of model-free learning with model-based planning remains an independent empirical claim.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests primarily on the domain assumption that GNN graph embeddings of propositions and actions will yield usable planning heuristics; no free parameters, invented physical entities, or additional axioms are described in the abstract.

axioms (1)

domain assumption Embedding propositions and actions in a graph allows the GNN to explore latent relationships that form effective domain-specific heuristics.
Directly stated in the abstract as the mechanism enabling the framework.

invented entities (1)

LP-GNN no independent evidence
purpose: Framework that learns vectorized domain models integrating model-free learning and model-based planning
New method introduced by the paper; no independent evidence provided.

pith-pipeline@v0.9.0 · 5711 in / 1377 out tokens · 61927 ms · 2026-05-24T19:41:20.518601+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 2 internal anchors

[1]

Learning STRIPS action models with classical planning

Diego Aineto, Sergio Jiménez, and Eva Onaindia. Learning STRIPS action models with classical planning. In Proceedings of the Twenty-Eighth International Conference on Automated Planning and Scheduling, ICAPS 2018, Delft, The Netherlands, June 24-29, 2018., pages 399–407,

work page 2018
[2]

Classical planning in deep latent space: Bridging the subsymbolic- symbolic boundary

Masataro Asai and Alex Fukunaga. Classical planning in deep latent space: Bridging the subsymbolic- symbolic boundary. In Proceedings of the 32nd AAAI Conference on Artiﬁcial Intelligence, (AAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pages 6094–6101,

work page 2018
[3]

Relational inductive biases, deep learning, and graph networks

Peter W. Battaglia, Jessica B. Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinícius Flores Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, Çaglar Gülçehre, Francis Song, Andrew J. Ballard, Justin Gilmer, George E. Dahl, Ashish Vaswani, Kelsey Allen, Charles Nash, Victoria Langston, Chris Dyer, Nicolas Heess, ...

work page internal anchor Pith review Pith/arXiv arXiv
[4]

Generalised domain model acquisition from action traces

Stephen Cresswell and Peter Gregory. Generalised domain model acquisition from action traces. In Proceedings of the 21st International Conference on Automated Planning and Scheduling, ICAPS 2011, Freiburg, Germany June 11-16, 2011,

work page 2011
[5]

Understanding the difﬁculty of training deep feedforward neural networks

Xavier Glorot and Yoshua Bengio. Understanding the difﬁculty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artiﬁcial Intelligence and Statistics, AISTATS 2010, Chia Laguna Resort, Sardinia, Italy, May 13-15, 2010, pages 249–256,

work page 2010
[6]

Domain model acquisition in domains with action costs

Peter Gregory and Alan Lindsay. Domain model acquisition in domains with action costs. In Proceedings of the Twenty-Sixth International Conference on Automated Planning and Scheduling, ICAPS 2016, London, UK, June 12-17, 2016., pages 149–157,

work page 2016
[7]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings,

work page 2015
[8]

Learning relational dynamics of stochastic domains for planning

10 David Martínez, Guillem Alenyà, Carme Torras, Tony Ribeiro, and Katsumi Inoue. Learning relational dynamics of stochastic domains for planning. In Proceedings of the Twenty-Sixth International Conference on Automated Planning and Scheduling, ICAPS 2016, London, UK, June 12-17, 2016., pages 235–243,

work page 2016
[9]

Efficient Estimation of Word Representations in Vector Space

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efﬁcient estimation of word representa- tions in vector space. CoRR, abs/1301.3781,

work page internal anchor Pith review Pith/arXiv arXiv
[10]

Kira Mourão, Ronald P. A. Petrick, and Mark Steedman. Learning action effects in partially observable domains. In ECAI 2010 - 19th European Conference on Artiﬁcial Intelligence, Lisbon, Portugal, August 16-20, 2010, Proceedings, pages 973–974,

work page 2010
[11]

Zettlemoyer, and Leslie Pack Kaelbling

Hanna Pasula, Luke S. Zettlemoyer, and Leslie Pack Kaelbling. Learning probabilistic relational planning rules. In Proceedings of the Fourteenth International Conference on Automated Planning and Scheduling (ICAPS 2004), June 3-7 2004, Whistler, British Columbia, Canada, pages 73–82,

work page 2004
[12]

Efﬁcient, safe, and probably approximately complete learning of action models

Roni Stern and Brendan Juba. Efﬁcient, safe, and probably approximately complete learning of action models. In Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence, IJCAI 2017, Melbourne, Australia, August 19-25, 2017, pages 4405–4411,

work page 2017
[13]

Learning by observation and practice: A framework for automatic acquisition of planning operators

Xuemei Wang. Learning by observation and practice: A framework for automatic acquisition of planning operators. In Proceedings of the 12th National Conference on Artiﬁcial Intelligence, Seattle, WA, USA, July 31 - August 4, 1994, Volume 2., page 1496,

work page 1994
[14]

Action-model acquisition from noisy plan traces

Hankz Hankui Zhuo and Subbarao Kambhampati. Action-model acquisition from noisy plan traces. In IJCAI 2013, Proceedings of the 23rd International Joint Conference on Artiﬁcial Intelligence, Beijing, China, August 3-9, 2013, pages 2444–2450,

work page 2013
[15]

Transferring knowledge from another domain for learning action models

Hankui Zhuo, Qiang Yang, Derek Hao Hu, and Lei Li. Transferring knowledge from another domain for learning action models. In PRICAI 2008: Trends in Artiﬁcial Intelligence, 10th Paciﬁc Rim International Conference on Artiﬁcial Intelligence, Hanoi, Vietnam, December 15-19,

work page 2008
[16]

Learning action models for multi- agent planning

Hankz Hankui Zhuo, Hector Muñoz-Avila, and Qiang Yang. Learning action models for multi- agent planning. In Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2011), Taipei, Taiwan, May 2-6, 2011, Volume 1-3, pages 217–224,

work page 2011
[17]

Cross-domain action-model acquisition for planning via web search

Hankz Hankui Zhuo, Qiang Yang, Rong Pan, and Lei Li. Cross-domain action-model acquisition for planning via web search. In Proceedings of the 21st International Conference on Automated Planning and Scheduling, ICAPS 2011, Freiburg, Germany June 11-16, 2011,

work page 2011
[18]

Crowdsourced action-model acquisition for planning

Hankz Hankui Zhuo. Crowdsourced action-model acquisition for planning. In Proceedings of the Twenty-Ninth AAAI Conference on Artiﬁcial Intelligence, January 25-30, 2015, Austin, Texas, USA., pages 3439–3446,

work page 2015

[1] [1]

Learning STRIPS action models with classical planning

Diego Aineto, Sergio Jiménez, and Eva Onaindia. Learning STRIPS action models with classical planning. In Proceedings of the Twenty-Eighth International Conference on Automated Planning and Scheduling, ICAPS 2018, Delft, The Netherlands, June 24-29, 2018., pages 399–407,

work page 2018

[2] [2]

Classical planning in deep latent space: Bridging the subsymbolic- symbolic boundary

Masataro Asai and Alex Fukunaga. Classical planning in deep latent space: Bridging the subsymbolic- symbolic boundary. In Proceedings of the 32nd AAAI Conference on Artiﬁcial Intelligence, (AAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pages 6094–6101,

work page 2018

[3] [3]

Relational inductive biases, deep learning, and graph networks

Peter W. Battaglia, Jessica B. Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinícius Flores Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, Çaglar Gülçehre, Francis Song, Andrew J. Ballard, Justin Gilmer, George E. Dahl, Ashish Vaswani, Kelsey Allen, Charles Nash, Victoria Langston, Chris Dyer, Nicolas Heess, ...

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

Generalised domain model acquisition from action traces

Stephen Cresswell and Peter Gregory. Generalised domain model acquisition from action traces. In Proceedings of the 21st International Conference on Automated Planning and Scheduling, ICAPS 2011, Freiburg, Germany June 11-16, 2011,

work page 2011

[5] [5]

Understanding the difﬁculty of training deep feedforward neural networks

Xavier Glorot and Yoshua Bengio. Understanding the difﬁculty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artiﬁcial Intelligence and Statistics, AISTATS 2010, Chia Laguna Resort, Sardinia, Italy, May 13-15, 2010, pages 249–256,

work page 2010

[6] [6]

Domain model acquisition in domains with action costs

Peter Gregory and Alan Lindsay. Domain model acquisition in domains with action costs. In Proceedings of the Twenty-Sixth International Conference on Automated Planning and Scheduling, ICAPS 2016, London, UK, June 12-17, 2016., pages 149–157,

work page 2016

[7] [7]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings,

work page 2015

[8] [8]

Learning relational dynamics of stochastic domains for planning

10 David Martínez, Guillem Alenyà, Carme Torras, Tony Ribeiro, and Katsumi Inoue. Learning relational dynamics of stochastic domains for planning. In Proceedings of the Twenty-Sixth International Conference on Automated Planning and Scheduling, ICAPS 2016, London, UK, June 12-17, 2016., pages 235–243,

work page 2016

[9] [9]

Efficient Estimation of Word Representations in Vector Space

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efﬁcient estimation of word representa- tions in vector space. CoRR, abs/1301.3781,

work page internal anchor Pith review Pith/arXiv arXiv

[10] [10]

Kira Mourão, Ronald P. A. Petrick, and Mark Steedman. Learning action effects in partially observable domains. In ECAI 2010 - 19th European Conference on Artiﬁcial Intelligence, Lisbon, Portugal, August 16-20, 2010, Proceedings, pages 973–974,

work page 2010

[11] [11]

Zettlemoyer, and Leslie Pack Kaelbling

Hanna Pasula, Luke S. Zettlemoyer, and Leslie Pack Kaelbling. Learning probabilistic relational planning rules. In Proceedings of the Fourteenth International Conference on Automated Planning and Scheduling (ICAPS 2004), June 3-7 2004, Whistler, British Columbia, Canada, pages 73–82,

work page 2004

[12] [12]

Efﬁcient, safe, and probably approximately complete learning of action models

Roni Stern and Brendan Juba. Efﬁcient, safe, and probably approximately complete learning of action models. In Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence, IJCAI 2017, Melbourne, Australia, August 19-25, 2017, pages 4405–4411,

work page 2017

[13] [13]

Learning by observation and practice: A framework for automatic acquisition of planning operators

Xuemei Wang. Learning by observation and practice: A framework for automatic acquisition of planning operators. In Proceedings of the 12th National Conference on Artiﬁcial Intelligence, Seattle, WA, USA, July 31 - August 4, 1994, Volume 2., page 1496,

work page 1994

[14] [14]

Action-model acquisition from noisy plan traces

Hankz Hankui Zhuo and Subbarao Kambhampati. Action-model acquisition from noisy plan traces. In IJCAI 2013, Proceedings of the 23rd International Joint Conference on Artiﬁcial Intelligence, Beijing, China, August 3-9, 2013, pages 2444–2450,

work page 2013

[15] [15]

Transferring knowledge from another domain for learning action models

Hankui Zhuo, Qiang Yang, Derek Hao Hu, and Lei Li. Transferring knowledge from another domain for learning action models. In PRICAI 2008: Trends in Artiﬁcial Intelligence, 10th Paciﬁc Rim International Conference on Artiﬁcial Intelligence, Hanoi, Vietnam, December 15-19,

work page 2008

[16] [16]

Learning action models for multi- agent planning

Hankz Hankui Zhuo, Hector Muñoz-Avila, and Qiang Yang. Learning action models for multi- agent planning. In Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2011), Taipei, Taiwan, May 2-6, 2011, Volume 1-3, pages 217–224,

work page 2011

[17] [17]

Cross-domain action-model acquisition for planning via web search

Hankz Hankui Zhuo, Qiang Yang, Rong Pan, and Lei Li. Cross-domain action-model acquisition for planning via web search. In Proceedings of the 21st International Conference on Automated Planning and Scheduling, ICAPS 2011, Freiburg, Germany June 11-16, 2011,

work page 2011

[18] [18]

Crowdsourced action-model acquisition for planning

Hankz Hankui Zhuo. Crowdsourced action-model acquisition for planning. In Proceedings of the Twenty-Ninth AAAI Conference on Artiﬁcial Intelligence, January 25-30, 2015, Austin, Texas, USA., pages 3439–3446,

work page 2015