arxiv: 2602.07473 · v2 · submitted 2026-02-07 · 💻 cs.AI · cs.FL

Recognition: 2 theorem links

· Lean Theorem

Computing the Reachability Value of Posterior-Deterministic POMDPs

Nathana\"el Fijalkow , Arka Ghosh , Roman Kniazev , Guillermo A. P\'erez , Pierre Vandenhove

Authors on Pith no claims yet

Pith reviewed 2026-05-16 06:27 UTC · model grok-4.3

classification 💻 cs.AI cs.FL

keywords POMDPreachabilityapproximationposterior-deterministicverificationMarkov decision processpartial observability

0 comments

The pith

Posterior-deterministic POMDPs allow approximation of maximal reachability probability to arbitrary precision.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines posterior-deterministic POMDPs by the property that the next state is fixed once the current state, chosen action, and received observation are known. It proves that this property makes the maximal probability of eventually reaching any given target set approximable to any desired accuracy, in contrast to the undecidability results that hold for general POMDPs. The definition preserves the uncertainty typical of POMDPs yet guarantees that the first revealing observation collapses the belief to a single known state that remains known thereafter. This class strictly contains every fully observable MDP and also contains standard benchmark examples such as the Tiger POMDP.

Core claim

In a posterior-deterministic POMDP the maximal reachability value is computable to arbitrary precision because the posterior-deterministic property reduces the problem to reachability in an underlying MDP whose states are the possible observations after the initial revealing step.

What carries the argument

The posterior-deterministic property, which makes the successor state a deterministic function of the current state, action, and observation.

If this is right

Reachability verification becomes approximable for every posterior-deterministic POMDP.
All fully observable MDPs are included as a trivial special case.
The Tiger POMDP and similar classical examples now admit arbitrary-precision reachability values.
The class is claimed to be among the largest known families of POMDPs for which this approximation is possible.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Practical POMDPs arising in robotics or dialogue systems may often satisfy the property, allowing existing MDP solvers to be applied after a simple state expansion.
A polynomial-time check for the posterior-deterministic property could serve as a preprocessing filter before invoking the approximation procedure.
Similar reductions may extend to other objectives such as discounted reward or safety constraints.

Load-bearing premise

The POMDP must satisfy that the next state is uniquely determined from the current state, the action taken, and the observation received.

What would settle it

An explicit posterior-deterministic POMDP together with a target set for which no algorithm returns a value within additive error 0.01.

read the original abstract

Partially observable Markov decision processes (POMDPs) are a fundamental model for sequential decision-making under uncertainty. However, many verification and synthesis problems for POMDPs are undecidable or intractable. Most prominently, the seminal result of Madani et al. (2003) states that there is no algorithm that, given a POMDP and a set of target states, can compute the maximal probability of reaching the target states, or even approximate it up to a non-trivial constant. This is in stark contrast to fully observable Markov decision processes (MDPs), where the reachability value can be computed in polynomial time. In this work, we introduce posterior-deterministic POMDPs, a novel class of POMDPs. Our main technical contribution is to show that for posterior-deterministic POMDPs, the maximal probability of reaching a given set of states can be approximated up to arbitrary precision. A POMDP is posterior-deterministic if the next state can be uniquely determined by the current state, the action taken, and the observation received. While the actual state is generally uncertain in POMDPs, the posterior-deterministic property tells us that once the true state is known it remains known forever. This simple and natural definition includes all MDPs and captures classical non-trivial examples such as the Tiger POMDP (Kaelbling et al. 1998), making it one of the largest known classes of POMDPs for which the reachability value can be approximated.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Posterior-deterministic POMDPs turn an undecidable reachability problem into an approximable one via a natural state-resolution property.

read the letter

This paper introduces posterior-deterministic POMDPs and proves that maximal reachability probabilities can be approximated to any desired precision in this class. That's the central result, and it stands in contrast to the general undecidability for POMDPs from Madani et al. 2003. The definition is straightforward: the next state must be uniquely recoverable from the current state, action, and observation. Once a belief resolves to a single state, it stays resolved. This property lets them reduce the problem to an MDP on the resolved states plus handling a finite number of transient belief states. It includes every MDP and captures things like the Tiger example, so the class is bigger than just the fully observable case. The work does a good job of motivating why this condition is natural and why it preserves the ability to compute or approximate values. The argument structure avoids circularity and builds on standard POMDP theory without inventing new entities. One soft spot is the lack of explicit algorithm details or complexity bounds in the high-level description. It would help to see how the approximation is actually carried out and whether the transient beliefs cause any blowup. But nothing in the setup suggests a hidden flaw. This paper is for researchers in verification and synthesis for POMDPs who want to identify larger tractable fragments. A reader interested in decision-making under partial observability would get value from the new class definition and the reduction idea. I recommend sending it for peer review. The claim is specific enough and the class definition seems solid enough to warrant a full check of the proofs.

Referee Report

2 major / 3 minor

Summary. The paper introduces posterior-deterministic POMDPs, defined by the property that the next state is uniquely recoverable from the current state, action, and observation. It claims that in this class the maximal reachability probability to a target set can be approximated to arbitrary precision, in contrast to the undecidability result of Madani et al. (2003) for general POMDPs. The class includes all MDPs and non-trivial examples such as the Tiger POMDP.

Significance. If the central reduction holds, the result is significant: it identifies one of the largest known natural subclasses of POMDPs for which reachability is approximable, providing a concrete bridge between fully observable MDPs and general POMDPs. The new class definition and the reduction to an equivalent MDP on resolved components plus a finite set of transient belief states are the key technical contributions.

major comments (2)

[§3.2] §3.2, reduction paragraph: the argument that the set of transient belief states remains finite relies on the posterior-deterministic property preserving resolved states, but the manuscript does not supply an explicit bound on the size of this set or a termination argument for the belief-update process when the observation space is finite but the initial belief support is large.
[Theorem 4.1] Theorem 4.1: the claim that the value iteration on the constructed MDP converges to the POMDP reachability value within arbitrary precision is stated, yet the proof sketch omits the explicit contraction mapping or error-propagation argument that would justify the arbitrary-precision guarantee once the transient component is solved exactly.

minor comments (3)

[Abstract] Abstract: the phrase 'one of the largest known classes' is informal; a short sentence comparing the new class to existing decidable subclasses (e.g., those with finite belief support or memoryless policies) would clarify the contribution.
[§2] Notation: the distinction between resolved and unresolved belief states is introduced in §2 but the symbols for the two components are not consistently used in the algorithm pseudocode of §4.
[Figure 1] Figure 1: the Tiger POMDP example diagram would benefit from an explicit annotation showing which transitions become deterministic under the posterior-deterministic property.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment and constructive comments. We address each major point below and will revise the manuscript to incorporate the requested clarifications.

read point-by-point responses

Referee: [§3.2] §3.2, reduction paragraph: the argument that the set of transient belief states remains finite relies on the posterior-deterministic property preserving resolved states, but the manuscript does not supply an explicit bound on the size of this set or a termination argument for the belief-update process when the observation space is finite but the initial belief support is large.

Authors: We agree that an explicit bound and termination argument would strengthen the presentation. The posterior-deterministic property ensures that any belief with support size 1 remains a Dirac delta after any update, and with finite observations the possible unresolved beliefs reachable from a given initial support are finite (as each update deterministically selects a unique successor state). In the revision we will add a lemma providing an explicit combinatorial bound on the transient belief set (at most |S| times the number of possible non-singleton supports consistent with the observation function) together with a termination argument showing that repeated belief updates reach resolved states after finitely many steps. revision: yes
Referee: [Theorem 4.1] Theorem 4.1: the claim that the value iteration on the constructed MDP converges to the POMDP reachability value within arbitrary precision is stated, yet the proof sketch omits the explicit contraction mapping or error-propagation argument that would justify the arbitrary-precision guarantee once the transient component is solved exactly.

Authors: We thank the referee for noting this omission. The reduction yields an MDP whose transient component is finite and therefore solvable exactly (e.g., by linear programming). The remaining components are absorbing resolved states whose values are known exactly. Standard value iteration on this finite MDP converges monotonically to the exact reachability probabilities; because the transient part contributes only a finite additive error that can be bounded, the overall approximation can be made arbitrarily precise. In the revision we will expand the proof with an explicit error-propagation lemma showing how any epsilon-approximation of the constructed MDP translates to an epsilon-approximation of the original POMDP reachability value. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces the posterior-deterministic POMDP class via an explicit definition (next state uniquely recoverable from current state, action, and observation) and then establishes an approximation result for reachability values on that class. The derivation reduces the problem to an equivalent fully observable MDP on resolved components plus transient beliefs, using only the new definition plus standard POMDP reachability algorithms. No equations reduce a claimed prediction to a fitted input by construction, no load-bearing self-citation chain is present, and the central claim is not equivalent to its inputs. The argument is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The claim rests on the standard mathematical definition of POMDPs and reachability values plus the newly introduced subclass property; no numerical parameters are fitted and no new physical entities are postulated.

axioms (2)

standard math POMDPs are defined with finite sets of states, actions, observations, transition probabilities, and observation probabilities.
Standard background model from the POMDP literature cited in the abstract.
standard math The reachability value is the supremum over policies of the probability of eventually reaching the target set.
Standard definition used throughout MDP and POMDP theory.

pith-pipeline@v0.9.0 · 5586 in / 1401 out tokens · 81646 ms · 2026-05-16T06:27:08.880170+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

A POMDP is posterior-deterministic if the next state can be uniquely determined by the current state, the action taken, and the observation received... the size of the current belief support can never increase
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_injective unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

support end component (SEC)... Reachf(S) ... maximal SECs

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.