Artificial intelligence , volume=

Planning, acting in partially observable stochastic domains , author= · 1998

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

representative citing papers

Value-Decomposed Reinforcement Learning Framework for Taxiway Routing with Hierarchical Conflict-Aware Observations

cs.AI · 2026-05-09 · unverdicted · novelty 7.0 · 2 refs

CaTR applies value-decomposed RL with hierarchical conflict-aware observations to achieve better safety-efficiency trade-offs than planning, optimization, and standard RL baselines in a realistic airport taxiway simulation.

Effective Explanations Support Planning Under Uncertainty

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

Explanations scored higher by an LLM-plus-planner model are judged more helpful by people and produce measurably better navigation performance in uncertain environments than lower-scored or no explanations.

Policy Gradient Methods for Non-Markovian Reinforcement Learning

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

Introduces the Agent State-Markov Policy Gradient (ASMPG) algorithm and a policy gradient theorem for non-Markovian decision processes by jointly optimizing agent state dynamics and control policy.

Neural Co-state Policies: Structuring Hidden States in Recurrent Reinforcement Learning

cs.LG · 2026-05-06 · unverdicted · novelty 5.0

Recurrent RL policies can have their hidden states aligned with PMP co-states through a derived loss, yielding robust performance on partially observable control tasks.

citing papers explorer

Showing 4 of 4 citing papers.

Value-Decomposed Reinforcement Learning Framework for Taxiway Routing with Hierarchical Conflict-Aware Observations cs.AI · 2026-05-09 · unverdicted · none · ref 28 · 2 links
CaTR applies value-decomposed RL with hierarchical conflict-aware observations to achieve better safety-efficiency trade-offs than planning, optimization, and standard RL baselines in a realistic airport taxiway simulation.
Effective Explanations Support Planning Under Uncertainty cs.CL · 2026-05-08 · unverdicted · none · ref 10
Explanations scored higher by an LLM-plus-planner model are judged more helpful by people and produce measurably better navigation performance in uncertain environments than lower-scored or no explanations.
Policy Gradient Methods for Non-Markovian Reinforcement Learning cs.LG · 2026-05-11 · unverdicted · none · ref 40
Introduces the Agent State-Markov Policy Gradient (ASMPG) algorithm and a policy gradient theorem for non-Markovian decision processes by jointly optimizing agent state dynamics and control policy.
Neural Co-state Policies: Structuring Hidden States in Recurrent Reinforcement Learning cs.LG · 2026-05-06 · unverdicted · none · ref 47
Recurrent RL policies can have their hidden states aligned with PMP co-states through a derived loss, yielding robust performance on partially observable control tasks.

Artificial intelligence , volume=

fields

years

verdicts

representative citing papers

citing papers explorer