Efficient Test-time Inference for Generative Planning Models with OCL Search

Federico Pecora; Jeremy L. Wyatt; Mihai Samson; Robert Gieselmann

arxiv: 2606.00618 · v2 · pith:536FPNXMnew · submitted 2026-05-30 · 💻 cs.AI

Efficient Test-time Inference for Generative Planning Models with OCL Search

Robert Gieselmann , Mihai Samson , Federico Pecora , Jeremy L. Wyatt This is my paper

Pith reviewed 2026-06-28 18:51 UTC · model grok-4.3

classification 💻 cs.AI

keywords generative planningtest-time inferenceopen-closed list searchcombinatorial planningheuristic searchneurosymbolic methods

0 comments

The pith

A modified Open-Closed List search gives generative planning models an efficient test-time inference procedure by pairing fast rollouts with heuristics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that optimizing the inference process itself, rather than scaling compute or retraining, can overcome the limits of generative models trained on narrow data distributions. It does so by adapting classical Open-Closed List search to combine a generative model that quickly extends partial plans with a heuristic model that ranks which paths to pursue next. Novel controls on exploration are added to make the integration work inside the OCL structure. If the claim holds, planners would deliver higher-quality solutions faster across combinatorial domains without domain-specific tuning. Readers would care because the method shows how to leverage two learned components together instead of treating inference as a black-box scaling problem.

Core claim

The central claim is that a modified version of classical Open-Closed List search provides an efficient inference procedure that synergizes a generative model performing fast rollouts from intermediate states and a heuristic model prioritizing reasoning paths, with novel exploration control mechanisms, outperforming both neurosymbolic search baselines and classical solvers in computational efficiency and solution quality across multiple combinatorial planning domains.

What carries the argument

Modified Open-Closed List (OCL) search that integrates a generative model for fast rollouts and a heuristic model for path prioritization together with new exploration control mechanisms.

If this is right

The approach outperforms neurosymbolic search baselines in both computational efficiency and solution quality.
It achieves higher solution quality than classical solvers across multiple combinatorial domains.
The method requires no domain-specific tuning or post-hoc adjustments.
Novel exploration controls enable stable integration of the two learned models inside the OCL framework.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Classical search structures can be updated with separate learned rollout and heuristic components to mitigate distribution shift at inference time.
Similar pairings of generative and evaluative models could be tested in other constrained generation tasks such as program synthesis.
The relative contribution of the generative versus heuristic component could be measured by ablating one while keeping the OCL skeleton fixed.

Load-bearing premise

The generative model must produce accurate fast rollouts from intermediate states and the heuristic model must reliably prioritize better reasoning paths without post-hoc fixes.

What would settle it

Running the method on a standard planning benchmark where the generative model produces inaccurate rollouts from partial states and finding no improvement over baselines would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.00618 by Federico Pecora, Jeremy L. Wyatt, Mihai Samson, Robert Gieselmann.

**Figure 1.** Figure 1: Illustration of one iteration with OCLGen. Note that we maintain a graph structure of states and transitions that is updated after every search iteration. open list O (as in standard A∗ ) strictly prefers the deeper node n2, despite both offering paths to equally good solutions. The search thus systematically over-commits to deeper, arbitrary branches, leaving promising shallower alternative routes—where… view at source ↗

**Figure 2.** Figure 2: Plan length over time across all domains. OCLGEN rapidly converges to shorter plans compared to baseline methods. 0 20 40 60 80 100 Runtime (seconds) 97.5 98.0 98.5 99.0 99.5 100.0 Completion Rate (%) Blocksworld - Completion Rate vs Runtime OCLGen (scan) OCLGen (uniform) Best-of-N MCTS MCTS (partial R.) (a) Blocksworld 0 50 100 150 200 250 300 Runtime (seconds) 99.5 99.6 99.7 99.8 99.9 100.0 Completion Ra… view at source ↗

**Figure 3.** Figure 3: Completion rate over time across all domains. Domain Metric w/o Depth w/o Adaptive w/o Percentile selection expansion estimate (mode) Blocksworld Optim. 366 / 630 496 / 630 486 / 630 Comp.[%] 100.0 100.0 100.0 Length 48.81 (± 0.87) 44.35 (± 0.69) 44.68 (± 0.70) Logistics Optim. 78 / 169 98 / 169 98 / 169 Comp.[%] 99.9 100.0 100.0 Length 159.23 (± 3.52) 155.82 (± 3.47) 156.11 (± 3.48) Labyrinth Optim. 979 /… view at source ↗

**Figure 4.** Figure 4: Blocksworld - Distribution of absolute errors of learned heuristic cost model with respect to test labels (suboptimal data). 17 [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗

**Figure 5.** Figure 5: Logistics - Distribution of absolute errors of learned heuristic cost model with respect to test labels (suboptimal data). 0 10 20 30 40 50 60 Absolute Error 0.0 0.1 0.2 0.3 0.4 0.5 Probability Labyrinth - Absolute error of predicted heuristic cost (mode) w.r.t test labels (suboptimal) Mean absolute error: 3.95 [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗

**Figure 6.** Figure 6: Labyrinth - Distribution of absolute errors of learned heuristic cost model with respect to test labels (suboptimal data). 0 25 50 75 100 125 150 175 Absolute Error 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Probability Sokoban - Absolute error of predicted heuristic cost (mode) w.r.t test labels (suboptimal) Mean absolute error: 7.78 [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗

**Figure 7.** Figure 7: Sokoban - Distribution of absolute errors of learned heuristic cost model with respect to test labels (suboptimal data). 18 [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

**Figure 8.** Figure 8: Blocksworld - Examples of cost distribution generated by the learned heuristic model. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗

**Figure 9.** Figure 9: Logistics - Examples of cost distribution generated by the learned heuristic model. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_9.png] view at source ↗

**Figure 10.** Figure 10: Labyrinth - Examples of cost distribution generated by the learned heuristic model. 21 [PITH_FULL_IMAGE:figures/full_fig_p021_10.png] view at source ↗

**Figure 11.** Figure 11: Sokoban - Examples of cost distribution generated by the learned heuristic model. 22 [PITH_FULL_IMAGE:figures/full_fig_p022_11.png] view at source ↗

read the original abstract

Generative models have emerged as a powerful paradigm for AI planning, yet their performance remains constrained by the training data distribution. One approach is to improve generated solutions during inference by scaling test-time compute. A more efficient alternative is to optimize the inference process itself. In this paper, we show that a modified version of a classical Open-Closed List (OCL) search provides just such an efficient inference procedure. Our algorithm synergizes two learned components: a generative model that performs fast rollouts from intermediate states and a heuristic model that prioritizes among candidate reasoning paths. Key contributions include novel exploration control mechanisms and integration of learned models within the OCL framework. Across multiple combinatorial planning domains, our approach outperforms both neurosymbolic search baselines and classical solvers in computational efficiency and solution quality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adapts classical OCL search with a generative rollout model and a learned heuristic for test-time planning inference, but the abstract gives no experimental details to back the outperformance claims.

read the letter

The core idea is straightforward: take open-closed list search, add a generative model that does fast rollouts from intermediate states and a heuristic that ranks paths, then add some exploration controls to make the combination work. This is positioned as a way to improve inference without more training.

What stands out is the attempt to make the search itself more efficient by blending the two learned pieces inside a classical framework rather than treating search as a black box. The abstract mentions novel exploration controls, which could be the actual technical step if they turn out to be more than minor tweaks.

The main weakness is that nothing in the provided text shows the experiments. No setup, no domains listed in detail, no error bars, no comparison tables. The central claim that it beats both neurosymbolic baselines and classical solvers therefore cannot be checked. The stress-test concern about rollout accuracy from intermediate states also lands: generative models are typically trained on complete trajectories, and the paper gives no evidence that the same model stays reliable when the search reaches states that were never seen in training. If rollout error grows or the heuristic steers into bad branches, the reported gains disappear.

This is for researchers already working on generative models for planning who want to see how classical search can be reused at test time. A reader who needs reproducible results or clear ablation studies will not get much yet.

It should go to peer review so the experiments can be examined directly; the idea is coherent enough to be worth checking even if the current write-up is thin.

Referee Report

2 major / 0 minor

Summary. The manuscript claims that a modified version of classical Open-Closed List (OCL) search serves as an efficient test-time inference procedure for generative planning models. It integrates a generative model for fast rollouts from intermediate states with a heuristic model to prioritize reasoning paths, along with novel exploration control mechanisms. The approach is reported to outperform neurosymbolic search baselines and classical solvers in computational efficiency and solution quality across multiple combinatorial planning domains.

Significance. If the empirical claims are substantiated with proper controls and validation, the work would offer a meaningful contribution to test-time optimization in generative planning. It demonstrates how classical search structures can be adapted to synergize with learned generative and heuristic components, providing an alternative to simply scaling inference compute.

major comments (2)

[Abstract] Abstract: The central claim of outperformance over baselines and classical solvers is presented without any details on experimental setup, error bars, statistical controls, or number of domains/instances, which prevents verification of the reported gains in efficiency and quality.
[§3 (OCL Integration) or equivalent methods description] The manuscript provides no direct empirical validation that the generative model maintains accurate rollout fidelity when conditioned on intermediate states reached via heuristic-guided exploration rather than complete training trajectories; this assumption is load-bearing for the claimed synergy and efficiency improvements.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We address each major comment below and will revise the manuscript to improve clarity and add requested validation.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim of outperformance over baselines and classical solvers is presented without any details on experimental setup, error bars, statistical controls, or number of domains/instances, which prevents verification of the reported gains in efficiency and quality.

Authors: We agree that the abstract lacks sufficient experimental context. In the revised version, we will expand the abstract to report the number of domains and instances evaluated, note the inclusion of error bars, and indicate that statistical controls were applied in the comparisons. Full details remain in the experimental section. revision: yes
Referee: [§3 (OCL Integration) or equivalent methods description] The manuscript provides no direct empirical validation that the generative model maintains accurate rollout fidelity when conditioned on intermediate states reached via heuristic-guided exploration rather than complete training trajectories; this assumption is load-bearing for the claimed synergy and efficiency improvements.

Authors: This observation is correct; the manuscript does not include a dedicated empirical check of rollout fidelity specifically on heuristically reached intermediate states. We will add such validation in the revision (e.g., an ablation measuring rollout accuracy or divergence on states generated by the heuristic-guided process versus training trajectories) to directly support the claimed synergy. revision: yes

Circularity Check

0 steps flagged

No circularity: algorithmic description with no self-referential derivations or fitted predictions

full rationale

The paper presents a modified OCL search algorithm that integrates a generative model for rollouts and a heuristic model. No equations, derivations, or parameter-fitting steps are described in the provided abstract or claims. The central claim is an empirical performance improvement from the search procedure itself, not a prediction that reduces to its own inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems. The derivation chain is self-contained as an engineering contribution rather than a mathematical reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no information on free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5663 in / 961 out tokens · 13615 ms · 2026-06-28T18:51:22.386654+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

46 extracted references · 15 canonical work pages · 4 internal anchors

[1]

Learning general policies for planning through GPT models , author=. Intl. Conf. on Automated Planning and Scheduling , volume=
[2]

Complexity results for standard benchmark domains in planning , journal =

Malte Helmert , keywords =. Complexity results for standard benchmark domains in planning , journal =. 2003 , issn =. doi:https://doi.org/10.1016/S0004-3702(02)00364-8 , url =

work page doi:10.1016/s0004-3702(02)00364-8 2003
[3]

The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=

AlphaMath Almost Zero: Process Supervision without Process , author=. The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=
[4]

Advances in neural information processing systems , volume=

Learning to summarize with human feedback , author=. Advances in neural information processing systems , volume=
[5]

, author=

Plansformer Tool: Demonstrating Generation of Symbolic Plans Using Transformers. , author=. IJCAI , year=
[6]

Position: LLMs can’t plan, but can help planning in LLM-modulo frameworks , author=
[7]

Enhancing GPT-based planning policies by model-based plan validation , author=. Intl. Conf. on Neural-Symbolic Learning and Reasoning , year=
[8]

Integrating classical planners with gpt-based planning policies , author=. Intl. Conf. of the Italian Association for Artificial Intelligence , pages=. 2024 , organization=

2024
[9]

arXiv preprint arXiv:2508.07743 , year=

Symmetry-Aware Transformer Training for Automated Planning , author=. arXiv preprint arXiv:2508.07743 , year=

work page internal anchor Pith review arXiv
[10]

Advances in Neural Information Processing Systems , volume=

Star: Bootstrapping reasoning with reasoning , author=. Advances in Neural Information Processing Systems , volume=
[11]

Advances in Neural Information Processing Systems , volume=

Toward self-improvement of llms via imagination, searching, and criticizing , author=. Advances in Neural Information Processing Systems , volume=
[12]

arXiv preprint arXiv:2402.03610 , year=

Rap: Retrieval-augmented planning with contextual memory for multimodal llm agents , author=. arXiv preprint arXiv:2402.03610 , year=

work page arXiv
[13]

Advances in neural information processing systems , volume=

Tree of thoughts: Deliberate problem solving with large language models , author=. Advances in neural information processing systems , volume=
[14]

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking , author=. arXiv preprint arXiv:2501.04519 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[15]

The Eleventh International Conference on Learning Representations , year=

Planning with Large Language Models for Code Generation , author=. The Eleventh International Conference on Learning Representations , year=
[16]

LLM+P: Empowering Large Language Models with Optimal Planning Proficiency

LLM+P: Empowering Large Language Models with Optimal Planning Proficiency , author=. arXiv preprint arXiv:2304.11477 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[17]

Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning , url =

Guan, Lin and Valmeekam, Karthik and Sreedharan, Sarath and Kambhampati, Subbarao , booktitle =. Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning , url =
[18]

PDDL-the planning domain definition language , number=

McDermott, Drew and Ghallab, Malik and Howe, Adele and Knoblock, Craig and Ram, Ashwin and Veloso, Manuela and Weld, Daniel and Wilkins, David , year=. PDDL-the planning domain definition language , number=
[19]

and Long, D

Fox, M. and Long, D. , year=. PDDL2.1: An Extension to PDDL for Expressing Temporal Planning Domains , volume=. doi:10.1613/jair.1129 , journal=

work page doi:10.1613/jair.1129
[20]

Vallati and L

M. Vallati and L. Chrpa and M. Grzes and T. L. McCluskey and M. Roberts and S. Sanner , title =. Artificial Intelligence Magazine (. 2015 , volume=

2015
[21]

OpenAI blog , volume=

Language models are unsupervised multitask learners , author=. OpenAI blog , volume=
[22]

, journal =

Sokoban is PSPACE complete. , journal =. 1998 , author =

1998
[23]

Blocks World revisited , journal =

John Slaney and Sylvie Thiébaux , keywords =. Blocks World revisited , journal =. 2001 , issn =. doi:https://doi.org/10.1016/S0004-3702(00)00079-5 , url =

work page doi:10.1016/s0004-3702(00)00079-5 2001
[24]

The LAMA planner: Guiding cost-based anytime planning with landmarks , journal=

S Richter and M Westphal , volume=. The LAMA planner: Guiding cost-based anytime planning with landmarks , journal=
[25]

PDDL Generators

Jendrik Seipp and \'A lvaro Torralba and J \"o rg Hoffmann. PDDL Generators. 2022

2022
[26]

Labyrinth PDDL Domain

Rebecca Eifler and Daniel Fišer. Labyrinth PDDL Domain. 2023

2023
[27]

The 2023 International Planning Competition , year =

Taitler, Ayal and Alford, Ron and Espasa, Joan and Behnke, Gregor and Fi. The 2023 International Planning Competition , year =. doi:10.1002/aaai.12169 , journal =

work page doi:10.1002/aaai.12169 2023
[28]

Ai Magazine , year=

AIPS 2000 Planning Competition: The Fifth International Conference on Artificial Intelligence Planning and Scheduling Systems , author=. Ai Magazine , year=

2000
[29]

Olympiad-level formal mathematical reasoning with reinforcement learning

Olympiad-level formal mathematical reasoning with reinforcement learning , author=. Nature , year=. doi:10.1038/s41586-025-09833-y , url=

work page doi:10.1038/s41586-025-09833-y
[30]

The Thirteenth International Conference on Learning Representations , year=

Antonis Antoniades and Albert. The Thirteenth International Conference on Learning Representations , year=
[31]

The Thirteenth International Conference on Learning Representations , year=

DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search , author=. The Thirteenth International Conference on Learning Representations , year=
[32]

and Nilsson, Nils J

Hart, Peter E. and Nilsson, Nils J. and Raphael, Bertram , journal=. A Formal Basis for the Heuristic Determination of Minimum Cost Paths , year=
[33]

Proceedings of the AAAI Conference on Artificial Intelligence , author=

On the Completeness of Best-First Search Variants That Use Random Exploration , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2016 , month=. doi:10.1609/aaai.v30i1.10081 , abstractNote=

work page doi:10.1609/aaai.v30i1.10081 2016
[34]

16th IEEE Intl

VAL: Automatic plan validation, continuous effects and mixed initiative planning using PDDL , author=. 16th IEEE Intl. Conf. on Tools with Artificial Intelligence , pages=. 2004 , organization=

2004
[35]

Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search

Coulom, R \'e mi. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. Computers and Games. 2007

2007
[36]

, title =

Rosin, Christopher D. , title =. Annals of Mathematics and Artificial Intelligence , month = mar, pages =. 2011 , issue_date =. doi:10.1007/s10472-011-9258-6 , abstract =

work page doi:10.1007/s10472-011-9258-6 2011
[37]

Samuel, A. L. , journal=. Some Studies in Machine Learning Using the Game of Checkers , year=
[38]

Learning heuristic functions for large state spaces , journal =

Shahab. Learning heuristic functions for large state spaces , journal =. 2011 , issn =. doi:https://doi.org/10.1016/j.artint.2011.08.001 , url =

work page doi:10.1016/j.artint.2011.08.001 2011
[39]

Nature , volume=

Mastering the game of Go with deep neural networks and tree search , author=. Nature , volume=. 2016 , publisher=

2016
[40]

arXiv preprint arXiv:2402.14083 , year=

Beyond a*: Better planning with transformers via search dynamics bootstrapping , author=. arXiv preprint arXiv:2402.14083 , year=

work page arXiv
[41]

Advances in neural information processing systems , volume=

Thinking fast and slow with deep learning and tree search , author=. Advances in neural information processing systems , volume=
[42]

Learning generalized reactive policies using deep neural networks , author=. Intl. Conf. on Automated Planning and Scheduling , volume=
[43]

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

Mastering chess and shogi by self-play with a general reinforcement learning algorithm , author=. arXiv preprint arXiv:1712.01815 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[44]

arXiv preprint arXiv:2406.03816 , year=

ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search , author=. arXiv preprint arXiv:2406.03816 , year=

work page arXiv
[45]

2026 , eprint=

Self-Improvement for Fast, High-Quality Plan Generation , author=. 2026 , eprint=

2026
[46]

Bert: Pre-training of deep bidirectional transformers for language understanding , author=. Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers) , pages=

2019

[1] [1]

Learning general policies for planning through GPT models , author=. Intl. Conf. on Automated Planning and Scheduling , volume=

[2] [2]

Complexity results for standard benchmark domains in planning , journal =

Malte Helmert , keywords =. Complexity results for standard benchmark domains in planning , journal =. 2003 , issn =. doi:https://doi.org/10.1016/S0004-3702(02)00364-8 , url =

work page doi:10.1016/s0004-3702(02)00364-8 2003

[3] [3]

The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=

AlphaMath Almost Zero: Process Supervision without Process , author=. The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=

[4] [4]

Advances in neural information processing systems , volume=

Learning to summarize with human feedback , author=. Advances in neural information processing systems , volume=

[5] [5]

, author=

Plansformer Tool: Demonstrating Generation of Symbolic Plans Using Transformers. , author=. IJCAI , year=

[6] [6]

Position: LLMs can’t plan, but can help planning in LLM-modulo frameworks , author=

[7] [7]

Enhancing GPT-based planning policies by model-based plan validation , author=. Intl. Conf. on Neural-Symbolic Learning and Reasoning , year=

[8] [8]

Integrating classical planners with gpt-based planning policies , author=. Intl. Conf. of the Italian Association for Artificial Intelligence , pages=. 2024 , organization=

2024

[9] [9]

arXiv preprint arXiv:2508.07743 , year=

Symmetry-Aware Transformer Training for Automated Planning , author=. arXiv preprint arXiv:2508.07743 , year=

work page internal anchor Pith review arXiv

[10] [10]

Advances in Neural Information Processing Systems , volume=

Star: Bootstrapping reasoning with reasoning , author=. Advances in Neural Information Processing Systems , volume=

[11] [11]

Advances in Neural Information Processing Systems , volume=

Toward self-improvement of llms via imagination, searching, and criticizing , author=. Advances in Neural Information Processing Systems , volume=

[12] [12]

arXiv preprint arXiv:2402.03610 , year=

Rap: Retrieval-augmented planning with contextual memory for multimodal llm agents , author=. arXiv preprint arXiv:2402.03610 , year=

work page arXiv

[13] [13]

Advances in neural information processing systems , volume=

Tree of thoughts: Deliberate problem solving with large language models , author=. Advances in neural information processing systems , volume=

[14] [14]

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking , author=. arXiv preprint arXiv:2501.04519 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[15] [15]

The Eleventh International Conference on Learning Representations , year=

Planning with Large Language Models for Code Generation , author=. The Eleventh International Conference on Learning Representations , year=

[16] [16]

LLM+P: Empowering Large Language Models with Optimal Planning Proficiency

LLM+P: Empowering Large Language Models with Optimal Planning Proficiency , author=. arXiv preprint arXiv:2304.11477 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[17] [17]

Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning , url =

Guan, Lin and Valmeekam, Karthik and Sreedharan, Sarath and Kambhampati, Subbarao , booktitle =. Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning , url =

[18] [18]

PDDL-the planning domain definition language , number=

McDermott, Drew and Ghallab, Malik and Howe, Adele and Knoblock, Craig and Ram, Ashwin and Veloso, Manuela and Weld, Daniel and Wilkins, David , year=. PDDL-the planning domain definition language , number=

[19] [19]

and Long, D

Fox, M. and Long, D. , year=. PDDL2.1: An Extension to PDDL for Expressing Temporal Planning Domains , volume=. doi:10.1613/jair.1129 , journal=

work page doi:10.1613/jair.1129

[20] [20]

Vallati and L

M. Vallati and L. Chrpa and M. Grzes and T. L. McCluskey and M. Roberts and S. Sanner , title =. Artificial Intelligence Magazine (. 2015 , volume=

2015

[21] [21]

OpenAI blog , volume=

Language models are unsupervised multitask learners , author=. OpenAI blog , volume=

[22] [22]

, journal =

Sokoban is PSPACE complete. , journal =. 1998 , author =

1998

[23] [23]

Blocks World revisited , journal =

John Slaney and Sylvie Thiébaux , keywords =. Blocks World revisited , journal =. 2001 , issn =. doi:https://doi.org/10.1016/S0004-3702(00)00079-5 , url =

work page doi:10.1016/s0004-3702(00)00079-5 2001

[24] [24]

The LAMA planner: Guiding cost-based anytime planning with landmarks , journal=

S Richter and M Westphal , volume=. The LAMA planner: Guiding cost-based anytime planning with landmarks , journal=

[25] [25]

PDDL Generators

Jendrik Seipp and \'A lvaro Torralba and J \"o rg Hoffmann. PDDL Generators. 2022

2022

[26] [26]

Labyrinth PDDL Domain

Rebecca Eifler and Daniel Fišer. Labyrinth PDDL Domain. 2023

2023

[27] [27]

The 2023 International Planning Competition , year =

Taitler, Ayal and Alford, Ron and Espasa, Joan and Behnke, Gregor and Fi. The 2023 International Planning Competition , year =. doi:10.1002/aaai.12169 , journal =

work page doi:10.1002/aaai.12169 2023

[28] [28]

Ai Magazine , year=

AIPS 2000 Planning Competition: The Fifth International Conference on Artificial Intelligence Planning and Scheduling Systems , author=. Ai Magazine , year=

2000

[29] [29]

Olympiad-level formal mathematical reasoning with reinforcement learning

Olympiad-level formal mathematical reasoning with reinforcement learning , author=. Nature , year=. doi:10.1038/s41586-025-09833-y , url=

work page doi:10.1038/s41586-025-09833-y

[30] [30]

The Thirteenth International Conference on Learning Representations , year=

Antonis Antoniades and Albert. The Thirteenth International Conference on Learning Representations , year=

[31] [31]

The Thirteenth International Conference on Learning Representations , year=

DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search , author=. The Thirteenth International Conference on Learning Representations , year=

[32] [32]

and Nilsson, Nils J

Hart, Peter E. and Nilsson, Nils J. and Raphael, Bertram , journal=. A Formal Basis for the Heuristic Determination of Minimum Cost Paths , year=

[33] [33]

Proceedings of the AAAI Conference on Artificial Intelligence , author=

On the Completeness of Best-First Search Variants That Use Random Exploration , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2016 , month=. doi:10.1609/aaai.v30i1.10081 , abstractNote=

work page doi:10.1609/aaai.v30i1.10081 2016

[34] [34]

16th IEEE Intl

VAL: Automatic plan validation, continuous effects and mixed initiative planning using PDDL , author=. 16th IEEE Intl. Conf. on Tools with Artificial Intelligence , pages=. 2004 , organization=

2004

[35] [35]

Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search

Coulom, R \'e mi. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. Computers and Games. 2007

2007

[36] [36]

, title =

Rosin, Christopher D. , title =. Annals of Mathematics and Artificial Intelligence , month = mar, pages =. 2011 , issue_date =. doi:10.1007/s10472-011-9258-6 , abstract =

work page doi:10.1007/s10472-011-9258-6 2011

[37] [37]

Samuel, A. L. , journal=. Some Studies in Machine Learning Using the Game of Checkers , year=

[38] [38]

Learning heuristic functions for large state spaces , journal =

Shahab. Learning heuristic functions for large state spaces , journal =. 2011 , issn =. doi:https://doi.org/10.1016/j.artint.2011.08.001 , url =

work page doi:10.1016/j.artint.2011.08.001 2011

[39] [39]

Nature , volume=

Mastering the game of Go with deep neural networks and tree search , author=. Nature , volume=. 2016 , publisher=

2016

[40] [40]

arXiv preprint arXiv:2402.14083 , year=

Beyond a*: Better planning with transformers via search dynamics bootstrapping , author=. arXiv preprint arXiv:2402.14083 , year=

work page arXiv

[41] [41]

Advances in neural information processing systems , volume=

Thinking fast and slow with deep learning and tree search , author=. Advances in neural information processing systems , volume=

[42] [42]

Learning generalized reactive policies using deep neural networks , author=. Intl. Conf. on Automated Planning and Scheduling , volume=

[43] [43]

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

Mastering chess and shogi by self-play with a general reinforcement learning algorithm , author=. arXiv preprint arXiv:1712.01815 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[44] [44]

arXiv preprint arXiv:2406.03816 , year=

ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search , author=. arXiv preprint arXiv:2406.03816 , year=

work page arXiv

[45] [45]

2026 , eprint=

Self-Improvement for Fast, High-Quality Plan Generation , author=. 2026 , eprint=

2026

[46] [46]

Bert: Pre-training of deep bidirectional transformers for language understanding , author=. Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers) , pages=

2019