Recognition: 2 theorem links
· Lean TheoremComputing the Reachability Value of Posterior-Deterministic POMDPs
Pith reviewed 2026-05-16 06:27 UTC · model grok-4.3
The pith
Posterior-deterministic POMDPs allow approximation of maximal reachability probability to arbitrary precision.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In a posterior-deterministic POMDP the maximal reachability value is computable to arbitrary precision because the posterior-deterministic property reduces the problem to reachability in an underlying MDP whose states are the possible observations after the initial revealing step.
What carries the argument
The posterior-deterministic property, which makes the successor state a deterministic function of the current state, action, and observation.
If this is right
- Reachability verification becomes approximable for every posterior-deterministic POMDP.
- All fully observable MDPs are included as a trivial special case.
- The Tiger POMDP and similar classical examples now admit arbitrary-precision reachability values.
- The class is claimed to be among the largest known families of POMDPs for which this approximation is possible.
Where Pith is reading between the lines
- Practical POMDPs arising in robotics or dialogue systems may often satisfy the property, allowing existing MDP solvers to be applied after a simple state expansion.
- A polynomial-time check for the posterior-deterministic property could serve as a preprocessing filter before invoking the approximation procedure.
- Similar reductions may extend to other objectives such as discounted reward or safety constraints.
Load-bearing premise
The POMDP must satisfy that the next state is uniquely determined from the current state, the action taken, and the observation received.
What would settle it
An explicit posterior-deterministic POMDP together with a target set for which no algorithm returns a value within additive error 0.01.
read the original abstract
Partially observable Markov decision processes (POMDPs) are a fundamental model for sequential decision-making under uncertainty. However, many verification and synthesis problems for POMDPs are undecidable or intractable. Most prominently, the seminal result of Madani et al. (2003) states that there is no algorithm that, given a POMDP and a set of target states, can compute the maximal probability of reaching the target states, or even approximate it up to a non-trivial constant. This is in stark contrast to fully observable Markov decision processes (MDPs), where the reachability value can be computed in polynomial time. In this work, we introduce posterior-deterministic POMDPs, a novel class of POMDPs. Our main technical contribution is to show that for posterior-deterministic POMDPs, the maximal probability of reaching a given set of states can be approximated up to arbitrary precision. A POMDP is posterior-deterministic if the next state can be uniquely determined by the current state, the action taken, and the observation received. While the actual state is generally uncertain in POMDPs, the posterior-deterministic property tells us that once the true state is known it remains known forever. This simple and natural definition includes all MDPs and captures classical non-trivial examples such as the Tiger POMDP (Kaelbling et al. 1998), making it one of the largest known classes of POMDPs for which the reachability value can be approximated.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces posterior-deterministic POMDPs, defined by the property that the next state is uniquely recoverable from the current state, action, and observation. It claims that in this class the maximal reachability probability to a target set can be approximated to arbitrary precision, in contrast to the undecidability result of Madani et al. (2003) for general POMDPs. The class includes all MDPs and non-trivial examples such as the Tiger POMDP.
Significance. If the central reduction holds, the result is significant: it identifies one of the largest known natural subclasses of POMDPs for which reachability is approximable, providing a concrete bridge between fully observable MDPs and general POMDPs. The new class definition and the reduction to an equivalent MDP on resolved components plus a finite set of transient belief states are the key technical contributions.
major comments (2)
- [§3.2] §3.2, reduction paragraph: the argument that the set of transient belief states remains finite relies on the posterior-deterministic property preserving resolved states, but the manuscript does not supply an explicit bound on the size of this set or a termination argument for the belief-update process when the observation space is finite but the initial belief support is large.
- [Theorem 4.1] Theorem 4.1: the claim that the value iteration on the constructed MDP converges to the POMDP reachability value within arbitrary precision is stated, yet the proof sketch omits the explicit contraction mapping or error-propagation argument that would justify the arbitrary-precision guarantee once the transient component is solved exactly.
minor comments (3)
- [Abstract] Abstract: the phrase 'one of the largest known classes' is informal; a short sentence comparing the new class to existing decidable subclasses (e.g., those with finite belief support or memoryless policies) would clarify the contribution.
- [§2] Notation: the distinction between resolved and unresolved belief states is introduced in §2 but the symbols for the two components are not consistently used in the algorithm pseudocode of §4.
- [Figure 1] Figure 1: the Tiger POMDP example diagram would benefit from an explicit annotation showing which transitions become deterministic under the posterior-deterministic property.
Simulated Author's Rebuttal
We thank the referee for the positive assessment and constructive comments. We address each major point below and will revise the manuscript to incorporate the requested clarifications.
read point-by-point responses
-
Referee: [§3.2] §3.2, reduction paragraph: the argument that the set of transient belief states remains finite relies on the posterior-deterministic property preserving resolved states, but the manuscript does not supply an explicit bound on the size of this set or a termination argument for the belief-update process when the observation space is finite but the initial belief support is large.
Authors: We agree that an explicit bound and termination argument would strengthen the presentation. The posterior-deterministic property ensures that any belief with support size 1 remains a Dirac delta after any update, and with finite observations the possible unresolved beliefs reachable from a given initial support are finite (as each update deterministically selects a unique successor state). In the revision we will add a lemma providing an explicit combinatorial bound on the transient belief set (at most |S| times the number of possible non-singleton supports consistent with the observation function) together with a termination argument showing that repeated belief updates reach resolved states after finitely many steps. revision: yes
-
Referee: [Theorem 4.1] Theorem 4.1: the claim that the value iteration on the constructed MDP converges to the POMDP reachability value within arbitrary precision is stated, yet the proof sketch omits the explicit contraction mapping or error-propagation argument that would justify the arbitrary-precision guarantee once the transient component is solved exactly.
Authors: We thank the referee for noting this omission. The reduction yields an MDP whose transient component is finite and therefore solvable exactly (e.g., by linear programming). The remaining components are absorbing resolved states whose values are known exactly. Standard value iteration on this finite MDP converges monotonically to the exact reachability probabilities; because the transient part contributes only a finite additive error that can be bounded, the overall approximation can be made arbitrarily precise. In the revision we will expand the proof with an explicit error-propagation lemma showing how any epsilon-approximation of the constructed MDP translates to an epsilon-approximation of the original POMDP reachability value. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper introduces the posterior-deterministic POMDP class via an explicit definition (next state uniquely recoverable from current state, action, and observation) and then establishes an approximation result for reachability values on that class. The derivation reduces the problem to an equivalent fully observable MDP on resolved components plus transient beliefs, using only the new definition plus standard POMDP reachability algorithms. No equations reduce a claimed prediction to a fitted input by construction, no load-bearing self-citation chain is present, and the central claim is not equivalent to its inputs. The argument is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math POMDPs are defined with finite sets of states, actions, observations, transition probabilities, and observation probabilities.
- standard math The reachability value is the supremum over policies of the probability of eventually reaching the target set.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
A POMDP is posterior-deterministic if the next state can be uniquely determined by the current state, the action taken, and the observation received... the size of the current belief support can never increase
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanembed_injective unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
support end component (SEC)... Reachf(S) ... maximal SECs
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.