hub Canonical reference

LLM+P: Empowering Large Language Models with Optimal Planning Proficiency

Bo Liu, Yuqian Jiang, Xiaohan Zhang, Qiang Liu, Shiqi Zhang, Joydeep Biswas · 2023 · cs.AI · arXiv 2304.11477

Canonical reference. 92% of citing Pith papers cite this work as background.

45 Pith papers citing it

Background 92% of classified citations

open full Pith review browse 45 citing papers arXiv PDF

abstract

Large language models (LLMs) have demonstrated remarkable zero-shot generalization abilities: state-of-the-art chatbots can provide plausible answers to many common questions that arise in daily life. However, so far, LLMs cannot reliably solve long-horizon planning problems. By contrast, classical planners, once a problem is given in a formatted way, can use efficient search algorithms to quickly identify correct, or even optimal, plans. In an effort to get the best of both worlds, this paper introduces LLM+P, the first framework that incorporates the strengths of classical planners into LLMs. LLM+P takes in a natural language description of a planning problem, then returns a correct (or optimal) plan for solving that problem in natural language. LLM+P does so by first converting the language description into a file written in the planning domain definition language (PDDL), then leveraging classical planners to quickly find a solution, and then translating the found solution back into natural language. Along with LLM+P, we define a diverse set of different benchmark problems taken from common planning scenarios. Via a comprehensive set of experiments on these benchmark problems, we find that LLM+P is able to provide optimal solutions for most problems, while LLMs fail to provide even feasible plans for most problems.\footnote{The code and results are publicly available at https://github.com/Cranial-XIX/llm-pddl.git.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 12

citation-polarity summary

background 11 support 1

representative citing papers

LLM-Evolved Domain-Independent Heuristics for Symbolic AI Planning

cs.AI · 2026-05-28 · unverdicted · novelty 8.0

LLM-guided evolutionary search yields the first domain-independent C++ planning heuristics that exceed the strongest hand-engineered baselines on coverage and speed trade-offs across unseen domains.

Property-Guided LLM Program Synthesis for Planning

cs.AI · 2026-05-15 · unverdicted · novelty 7.0

Property-guided LLM program synthesis with counterexample feedback creates direct heuristics for PDDL planning domains that require far fewer generations and less evaluation cost than score-based baselines.

Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems

cs.AI · 2026-05-14 · unverdicted · novelty 7.0 · 2 refs

A survey that unifies prior work on multi-agent LLM systems via the LIFE framework, mapping dependencies across collaboration, failure attribution, and autonomous self-evolution while identifying cross-stage challenges.

Self-Improvement for Fast, High-Quality Plan Generation

cs.AI · 2026-05-05 · unverdicted · novelty 7.0

Self-improvement of a decoder-only transformer yields plans averaging 30% shorter than a source symbolic planner, over 80% optimal where known, with sub-exponential latency scaling.

LLM-Flax : Generalizable Robotic Task Planning via Neuro-Symbolic Approaches with Large Language Models

cs.RO · 2026-04-29 · unverdicted · novelty 7.0

LLM-Flax automates neuro-symbolic robotic task planning with three LLM stages for rule generation, failure recovery, and zero-shot scoring, outperforming manual baselines on MazeNamo grids.

ANCHOR: A Physically Grounded Closed-Loop Framework for Robust Home-Service Mobile Manipulation

cs.RO · 2026-04-28 · conditional · novelty 7.0

ANCHOR raises mobile manipulation success from 53.3% to 71.7% in unseen homes by binding plans to observable geometry, ensuring operable navigation endpoints, and using layered local recovery instead of global replans.

Using large language models for embodied planning introduces systematic safety risks

cs.AI · 2026-04-20 · unverdicted · novelty 7.0

LLM planners for robots often produce dangerous plans even when planning succeeds, with safety awareness staying flat as model scale improves planning ability.

Self-Correcting RAG: Enhancing Faithfulness via MMKP Context Selection and NLI-Guided MCTS

cs.CL · 2026-04-12 · unverdicted · novelty 7.0

Self-Correcting RAG formalizes retrieval as MMKP to maximize information density under token limits and uses NLI-guided MCTS to validate faithfulness, raising accuracy and cutting hallucinations on six multi-hop QA and fact-checking datasets.

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

cs.LG · 2024-10-07 · accept · novelty 7.0

LLMs display high variance and major accuracy drops on GSM-Symbolic variants of grade-school math problems, indicating they replicate training patterns rather than execute logical reasoning.

VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models

cs.RO · 2023-07-12 · unverdicted · novelty 7.0

VoxPoser uses LLMs to compose 3D value maps via VLM interaction for model-based synthesis of robust robot trajectories on open-set language-specified manipulation tasks.

Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models

cs.CL · 2023-05-06 · conditional · novelty 7.0

Plan-and-Solve prompting improves zero-shot LLM reasoning by first creating an explicit plan then executing subtasks, outperforming simple 'think step by step' prompts across ten datasets.

Any-ttach: Quick End-effector Swapping Enables Manipulation Dexterity with Simplicity

cs.RO · 2026-05-28 · unverdicted · novelty 6.0

Any-ttach shows that rapid end-effector swapping combined with demonstration collection and task planning enables reliable multi-tool skills in long-horizon tasks such as sandwich making.

Learning Bilevel Policies over Symbolic World Models for Long-Horizon Planning

cs.AI · 2026-05-15 · unverdicted · novelty 6.0

BISON learns bilevel policies over symbolic world models to generalize long-horizon robotic planning beyond VLA and end-to-end baselines while remaining efficient even at 10,000-object scale.

CSR: Infinite-Horizon Real-Time Policies with Massive Cached State Representations

cs.RO · 2026-05-08 · unverdicted · novelty 6.0

CSR with ASR enables infinite-horizon real-time LLM policies via stable KV-cache properties and background eviction, delivering 26x lower latency and SOTA recall on embodied benchmarks.

Decoupled Travel Planning with Behavior Forest

cs.LG · 2026-04-23 · unverdicted · novelty 6.0

Behavior Forest decouples multi-constraint travel planning into parallel behavior trees with LLM nodes and global coordination, yielding 6.67% and 11.82% gains over prior methods on two benchmarks.

Mind the Prompt: Self-adaptive Generation of Task Plan Explanations via LLMs

cs.AI · 2026-04-22 · unverdicted · novelty 6.0

COMPASS formalizes prompt engineering as a POMDP-based cognitive decision process for self-adaptive generation of task plan explanations via LLMs.

SYMBOLIZER: Symbolic Model-free Task Planning with VLMs

cs.RO · 2026-04-20 · unverdicted · novelty 6.0

SYMBOLIZER grounds symbolic states from images via VLMs using only lifted predicates and solves long-horizon tasks with goal-count and width-based heuristic search, outperforming direct VLM planning and matching VLM-heuristic baselines on ProDG and ViPlan benchmarks.

Select-then-Solve: Paradigm Routing as Inference-Time Optimization for LLM Agents

cs.CL · 2026-04-08 · conditional · novelty 6.0

A learned embedding-based router selecting among six reasoning paradigms improves LLM agent accuracy from 47.6% to 53.1% on average, beating the best fixed paradigm by 2.8pp.

KGLAMP: Knowledge Graph-guided Language model for Adaptive Multi-robot Planning and Replanning

cs.RO · 2026-02-04 · unverdicted · novelty 6.0

KGLAMP uses a dynamically updated knowledge graph to guide LLMs in creating and replanning PDDL specifications for heterogeneous multi-robot teams, reporting at least 25.3% better performance than LLM-only or classical PDDL baselines on the MAT-THOR benchmark.

UniDomain: Pretraining a Unified PDDL Domain from Real-World Demonstrations for Generalizable Robot Task Planning

cs.RO · 2025-07-29 · unverdicted · novelty 6.0

UniDomain extracts atomic PDDL domains from 12,393 robot videos to create a unified domain of 3137 operators and 2875 predicates, then retrieves and fuses relevant parts to enable zero-shot planning on unseen real-world tasks.

GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis

cs.AI · 2025-07-28 · unverdicted · novelty 6.0

GenoMAS deploys six specialized LLM agents with guided planning to preprocess transcriptomic data and identify genes, reaching 89.13% composite similarity and 60.48% F1 on the GenoTEX benchmark while outperforming prior methods.

Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

cs.AI · 2025-07-01 · conditional · novelty 6.0

Math reasoning gains in LLMs rarely transfer to general domains; RL tuning generalizes while SFT causes forgetting and representation drift.

A Survey on Vision-Language-Action Models for Embodied AI

cs.RO · 2024-05-23 · unverdicted · novelty 6.0

This is the first survey on vision-language-action models, providing a taxonomy across three lines, plus summaries of datasets, simulators, benchmarks, challenges, and future directions in embodied AI.

Cognitive Architectures for Language Agents

cs.AI · 2023-09-05 · accept · novelty 6.0

CoALA is a modular cognitive architecture for language agents that organizes memory components, action spaces for internal and external interaction, and a generalized decision-making loop to support more systematic development of capable agents.

citing papers explorer

Showing 45 of 45 citing papers.

LLM-Evolved Domain-Independent Heuristics for Symbolic AI Planning cs.AI · 2026-05-28 · unverdicted · none · ref 27 · internal anchor
LLM-guided evolutionary search yields the first domain-independent C++ planning heuristics that exceed the strongest hand-engineered baselines on coverage and speed trade-offs across unseen domains.
Property-Guided LLM Program Synthesis for Planning cs.AI · 2026-05-15 · unverdicted · none · ref 38 · internal anchor
Property-guided LLM program synthesis with counterexample feedback creates direct heuristics for PDDL planning domains that require far fewer generations and less evaluation cost than score-based baselines.
Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems cs.AI · 2026-05-14 · unverdicted · none · ref 123 · 2 links · internal anchor
A survey that unifies prior work on multi-agent LLM systems via the LIFE framework, mapping dependencies across collaboration, failure attribution, and autonomous self-evolution while identifying cross-stage challenges.
Self-Improvement for Fast, High-Quality Plan Generation cs.AI · 2026-05-05 · unverdicted · none · ref 24 · internal anchor
Self-improvement of a decoder-only transformer yields plans averaging 30% shorter than a source symbolic planner, over 80% optimal where known, with sub-exponential latency scaling.
LLM-Flax : Generalizable Robotic Task Planning via Neuro-Symbolic Approaches with Large Language Models cs.RO · 2026-04-29 · unverdicted · none · ref 17 · internal anchor
LLM-Flax automates neuro-symbolic robotic task planning with three LLM stages for rule generation, failure recovery, and zero-shot scoring, outperforming manual baselines on MazeNamo grids.
ANCHOR: A Physically Grounded Closed-Loop Framework for Robust Home-Service Mobile Manipulation cs.RO · 2026-04-28 · conditional · none · ref 19 · internal anchor
ANCHOR raises mobile manipulation success from 53.3% to 71.7% in unseen homes by binding plans to observable geometry, ensuring operable navigation endpoints, and using layered local recovery instead of global replans.
Using large language models for embodied planning introduces systematic safety risks cs.AI · 2026-04-20 · unverdicted · none · ref 46 · internal anchor
LLM planners for robots often produce dangerous plans even when planning succeeds, with safety awareness staying flat as model scale improves planning ability.
Self-Correcting RAG: Enhancing Faithfulness via MMKP Context Selection and NLI-Guided MCTS cs.CL · 2026-04-12 · unverdicted · none · ref 5 · internal anchor
Self-Correcting RAG formalizes retrieval as MMKP to maximize information density under token limits and uses NLI-guided MCTS to validate faithfulness, raising accuracy and cutting hallucinations on six multi-hop QA and fact-checking datasets.
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models cs.LG · 2024-10-07 · accept · none · ref 19 · internal anchor
LLMs display high variance and major accuracy drops on GSM-Symbolic variants of grade-school math problems, indicating they replicate training patterns rather than execute logical reasoning.
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models cs.RO · 2023-07-12 · unverdicted · none · ref 63 · internal anchor
VoxPoser uses LLMs to compose 3D value maps via VLM interaction for model-based synthesis of robust robot trajectories on open-set language-specified manipulation tasks.
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models cs.CL · 2023-05-06 · conditional · none · ref 2 · internal anchor
Plan-and-Solve prompting improves zero-shot LLM reasoning by first creating an explicit plan then executing subtasks, outperforming simple 'think step by step' prompts across ten datasets.
Any-ttach: Quick End-effector Swapping Enables Manipulation Dexterity with Simplicity cs.RO · 2026-05-28 · unverdicted · none · ref 39 · internal anchor
Any-ttach shows that rapid end-effector swapping combined with demonstration collection and task planning enables reliable multi-tool skills in long-horizon tasks such as sandwich making.
Learning Bilevel Policies over Symbolic World Models for Long-Horizon Planning cs.AI · 2026-05-15 · unverdicted · none · ref 84 · internal anchor
BISON learns bilevel policies over symbolic world models to generalize long-horizon robotic planning beyond VLA and end-to-end baselines while remaining efficient even at 10,000-object scale.
CSR: Infinite-Horizon Real-Time Policies with Massive Cached State Representations cs.RO · 2026-05-08 · unverdicted · none · ref 20 · internal anchor
CSR with ASR enables infinite-horizon real-time LLM policies via stable KV-cache properties and background eviction, delivering 26x lower latency and SOTA recall on embodied benchmarks.
Decoupled Travel Planning with Behavior Forest cs.LG · 2026-04-23 · unverdicted · none · ref 20 · internal anchor
Behavior Forest decouples multi-constraint travel planning into parallel behavior trees with LLM nodes and global coordination, yielding 6.67% and 11.82% gains over prior methods on two benchmarks.
Mind the Prompt: Self-adaptive Generation of Task Plan Explanations via LLMs cs.AI · 2026-04-22 · unverdicted · none · ref 52 · internal anchor
COMPASS formalizes prompt engineering as a POMDP-based cognitive decision process for self-adaptive generation of task plan explanations via LLMs.
SYMBOLIZER: Symbolic Model-free Task Planning with VLMs cs.RO · 2026-04-20 · unverdicted · none · ref 22 · internal anchor
SYMBOLIZER grounds symbolic states from images via VLMs using only lifted predicates and solves long-horizon tasks with goal-count and width-based heuristic search, outperforming direct VLM planning and matching VLM-heuristic baselines on ProDG and ViPlan benchmarks.
Select-then-Solve: Paradigm Routing as Inference-Time Optimization for LLM Agents cs.CL · 2026-04-08 · conditional · none · ref 2 · internal anchor
A learned embedding-based router selecting among six reasoning paradigms improves LLM agent accuracy from 47.6% to 53.1% on average, beating the best fixed paradigm by 2.8pp.
KGLAMP: Knowledge Graph-guided Language model for Adaptive Multi-robot Planning and Replanning cs.RO · 2026-02-04 · unverdicted · none · ref 6 · internal anchor
KGLAMP uses a dynamically updated knowledge graph to guide LLMs in creating and replanning PDDL specifications for heterogeneous multi-robot teams, reporting at least 25.3% better performance than LLM-only or classical PDDL baselines on the MAT-THOR benchmark.
UniDomain: Pretraining a Unified PDDL Domain from Real-World Demonstrations for Generalizable Robot Task Planning cs.RO · 2025-07-29 · unverdicted · none · ref 4 · internal anchor
UniDomain extracts atomic PDDL domains from 12,393 robot videos to create a unified domain of 3137 operators and 2875 predicates, then retrieves and fuses relevant parts to enable zero-shot planning on unseen real-world tasks.
GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis cs.AI · 2025-07-28 · unverdicted · none · ref 65 · internal anchor
GenoMAS deploys six specialized LLM agents with guided planning to preprocess transcriptomic data and identify genes, reaching 89.13% composite similarity and 60.48% F1 on the GenoTEX benchmark while outperforming prior methods.
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning cs.AI · 2025-07-01 · conditional · none · ref 109 · internal anchor
Math reasoning gains in LLMs rarely transfer to general domains; RL tuning generalizes while SFT causes forgetting and representation drift.
A Survey on Vision-Language-Action Models for Embodied AI cs.RO · 2024-05-23 · unverdicted · none · ref 64 · internal anchor
This is the first survey on vision-language-action models, providing a taxonomy across three lines, plus summaries of datasets, simulators, benchmarks, challenges, and future directions in embodied AI.
Cognitive Architectures for Language Agents cs.AI · 2023-09-05 · accept · none · ref 46 · internal anchor
CoALA is a modular cognitive architecture for language agents that organizes memory components, action spaces for internal and external interaction, and a generalized decision-making loop to support more systematic development of capable agents.
A Survey on Large Language Model based Autonomous Agents cs.AI · 2023-08-22 · accept · none · ref 58 · internal anchor
A survey of LLM-based autonomous agents that proposes a unified framework for their construction and reviews applications in social science, natural science, and engineering along with evaluation methods and future directions.
Reasoning with Language Model is Planning with World Model cs.CL · 2023-05-24 · unverdicted · none · ref 105 · internal anchor
RAP turns LLMs into dual world-model and planning agents via MCTS to generate better reasoning paths, outperforming CoT baselines and achieving 33% relative gains over GPT-4 CoT using LLaMA-33B on plan generation.
Novelty-based Tree-of-Thought Search for LLM Reasoning and Planning cs.AI · 2026-05-07 · unverdicted · none · ref 31 · internal anchor
Novelty estimation via LLM prompts enables pruning in Tree-of-Thought search, reducing overall token usage on language planning benchmarks.
Bridging Values and Behavior: A Hierarchical Framework for Proactive Embodied Agents cs.AI · 2026-04-30 · unverdicted · none · ref 22 · internal anchor
ValuePlanner is a hierarchical architecture that uses LLMs to generate value-based subgoals and PDDL planners to produce executable actions, enabling self-directed behavior in embodied agents.
From Coarse to Fine: Self-Adaptive Hierarchical Planning for LLM Agents cs.AI · 2026-04-25 · unverdicted · none · ref 12 · internal anchor
AdaPlan-H enables LLM agents to generate self-adaptive hierarchical plans that adjust detail level to task difficulty, improving success rates in multi-step tasks.
End-to-end PDDL Planning with Hardcoded and Dynamic Agents cs.AI · 2025-12-10 · unverdicted · none · ref 21 · internal anchor
An end-to-end LLM framework refines natural language into valid PDDL domains and problems via hardcoded and dynamic agents, generates plans with standard engines, and returns readable output.
Adaptive Obstacle-Aware Task Assignment and Planning for Heterogeneous Robot Teaming cs.RO · 2025-10-15 · unverdicted · none · ref 56 · internal anchor
OATH combines adaptive Halton sampling, obstacle-aware clustering with auctions, and LLM-based instruction interpretation to improve task assignment and planning for heterogeneous robot teams in obstacle-rich environments.
LLM-Guided Task- and Affordance-Level Exploration in Reinforcement Learning cs.RO · 2025-09-20 · unverdicted · none · ref 30 · internal anchor
LLM-TALE steers RL exploration using LLM-generated plans at task and affordance levels with online suboptimality correction, improving sample efficiency and success rates on pick-and-place tasks without human supervision.
"Don't Do That!": Guiding Embodied Systems through Large Language Model-based Constraint Generation cs.AI · 2025-06-04 · unverdicted · none · ref 22 · internal anchor
STPR uses LLMs to generate Python constraint functions from natural language instructions, then applies them via traditional search algorithms to point clouds in simulated Gazebo robot environments with reported full compliance.
Iterative Formalization and Planning in Partially Observable Environments cs.AI · 2025-05-19 · unverdicted · none · ref 2 · internal anchor
PDDLego iteratively formalizes and refines PDDL representations of partially observable environments to improve planning success without finetuning or in-context examples.
Efficient Test-time Inference for Generative Planning Models with OCL Search cs.AI · 2026-05-30 · unverdicted · none · ref 16 · internal anchor
Modified OCL search integrates generative rollouts and learned heuristics for efficient inference in planning models across combinatorial domains.
AssemPlanner: A Multi-Agent Based Task Planning Framework for Flexible Assembly System cs.RO · 2026-05-09 · unverdicted · none · ref 44 · internal anchor
AssemPlanner is a ReAct-based multi-agent system that autonomously generates production plans from natural language inputs by integrating scheduling, knowledge, line balancing, and scene graph feedback.
Compiled AI: Deterministic Code Generation for LLM-Based Workflow Automation cs.SE · 2026-04-06 · unverdicted · none · ref 9 · internal anchor
Compiled AI generates deterministic code artifacts from LLMs in a one-time compilation step, enabling reliable workflow execution with zero runtime tokens after break-even.
Agentic Reasoning for Large Language Models cs.AI · 2026-01-18 · unverdicted · none · ref 72 · internal anchor
The survey structures agentic reasoning for LLMs into foundational, self-evolving, and collective multi-agent layers while distinguishing in-context orchestration from post-training optimization and reviewing applications across domains.
Understanding the planning of LLM agents: A survey cs.AI · 2024-02-05 · accept · none · ref 25 · internal anchor
A survey that provides a taxonomy of methods for improving planning in LLM-based agents across task decomposition, plan selection, external modules, reflection, and memory.
The Rise and Potential of Large Language Model Based Agents: A Survey cs.AI · 2023-09-14 · accept · none · ref 126 · internal anchor
The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.
Shaping Schema via Language Representation as the Next Frontier for LLM Intelligence Expanding cs.AI · 2026-05-10 · unverdicted · none · ref 118 · internal anchor
Advanced language representations shape LLMs' schemas to improve knowledge activation and problem-solving.
A Survey on Knowledge Distillation of Large Language Models cs.CL · 2024-02-20 · accept · none · ref 251 · internal anchor
A comprehensive survey of knowledge distillation for LLMs structured around algorithms, skill enhancement, and vertical applications, highlighting data augmentation as a key enabler.
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems cs.AI · 2025-03-31 · unverdicted · none · ref 219 · internal anchor
This survey frames foundation agents using brain-inspired modular architectures and reviews challenges in evolution, collaboration, and safety.
LLMs as ASP Programmers: Self-Correction Enables Task-Agnostic Nonmonotonic Reasoning cs.AI · 2026-04-30 · unreviewed · ref 3 · internal anchor
Verbalized Algorithms: Classical Algorithms are All You Need (Mostly) cs.CL · 2025-09-09 · unreviewed · ref 7 · internal anchor

LLM+P: Empowering Large Language Models with Optimal Planning Proficiency

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer