PEORL: Integrating Symbolic Planning and Hierarchical Reinforcement Learning for Robust Decision-Making

Fangkai Yang , Daoming Lyu , Bo Liu , Steven Gustafson

Authors on Pith no claims yet

classification 💻 cs.LG cs.AIstat.ML

keywords symboliclearningplanningreinforcementrobustdecision-makingdomainsexperience

read the original abstract

Reinforcement learning and symbolic planning have both been used to build intelligent autonomous agents. Reinforcement learning relies on learning from interactions with real world, which often requires an unfeasibly large amount of experience. Symbolic planning relies on manually crafted symbolic knowledge, which may not be robust to domain uncertainties and changes. In this paper we present a unified framework {\em PEORL} that integrates symbolic planning with hierarchical reinforcement learning (HRL) to cope with decision-making in a dynamic environment with uncertainties. Symbolic plans are used to guide the agent's task execution and learning, and the learned experience is fed back to symbolic knowledge to improve planning. This method leads to rapid policy search and robust symbolic plans in complex domains. The framework is tested on benchmark domains of HRL.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Towards Neuro-symbolic Causal Rule Synthesis, Verification, and Evaluation Grounded in Legal and Safety Principles
cs.LO 2026-04 unverdicted novelty 4.0

An LLM-based pipeline decomposes natural-language safety goals into minimal necessary and sufficient first-order logic rules, verifies them for consistency and safety, and integrates them into a causal neuro-symbolic ...
Towards Neuro-symbolic Causal Rule Synthesis, Verification, and Evaluation Grounded in Legal and Safety Principles
cs.LO 2026-04 unverdicted novelty 4.0

A meta-level neuro-symbolic layer uses LLMs to synthesize, consolidate, and verify minimal necessary-and-sufficient first-order causal rules from human-specified goals and principles, demonstrated in two autonomous-dr...