arxiv: 2605.12191 · v1 · submitted 2026-05-12 · 💻 cs.GT · math.PR

Recognition: 1 theorem link

· Lean Theorem

Sure-almost-sure and Sure-limit-sure Window Mean Payoff in Markov Decision Processes

Pranshu Gaba, Shibashis Guha

Authors on Pith no claims yet

Pith reviewed 2026-05-13 04:31 UTC · model grok-4.3

classification 💻 cs.GT math.PR

keywords Markov decision processeswindow mean-payoffsure-almost-suresure-limit-surecomputational complexitystrategy synthesisquantitative objectives

0 comments

The pith

Sure-almost-sure and sure-limit-sure problems for window mean-payoff in MDPs are in P for fixed windows and in NP ∩ coNP for bounded windows.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper solves the problem of finding a strategy in a finite Markov decision process that ensures a window mean-payoff of at least α on every possible run while also ensuring at least β with probability 1. It handles the sure-limit-sure relaxation where the probabilistic guarantee holds for any ε > 0. Both the sure-almost-sure and sure-limit-sure variants are shown to lie in P when the window length is fixed and given in unary, and in NP ∩ coNP when the length must only be bounded. This complexity exactly matches that of the sure and almost-sure problems taken separately. The work also supplies explicit memory bounds for the strategies or families of strategies that achieve these combined objectives when they exist.

Core claim

We solve the sure-almost-sure and sure-limit-sure problems for window mean-payoff objectives. We show that the sure-almost-sure problem and the sure-limit-sure problem are both in P for the fixed variant (if ℓ is given in unary) and are both in NP ∩ coNP for the bounded variant, matching the computational complexity of sure satisfaction and almost-sure satisfaction when considered separately for these objectives. We also give bounds for the memory requirement of winning strategies for all considered problems.

What carries the argument

The window mean-payoff objective, which requires the average payoff over every sliding finite window of length ℓ to exceed a threshold α or β, studied in its fixed-length variant (ℓ given) and bounded-length variant (ℓ unknown but finite throughout the run).

Load-bearing premise

The Markov decision processes are finite with rational probabilities and payoffs, and the window length ℓ is given in unary for the fixed variant.

What would settle it

A concrete finite MDP with rational transition probabilities, payoffs, and a unary window length ℓ for which no polynomial-time algorithm correctly decides whether a strategy exists that meets both the sure-α and almost-sure-β window mean-payoff conditions.

read the original abstract

Given rationals $\alpha$ and $\beta$, the sure-almost-sure problem for a quantitative objective $\varphi$ in a Markov decision process (MDP) asks if one can simultaneously ensure that all outcomes of the MDP have $\varphi$-value at least $\alpha$ (i.e. sure $\alpha$ satisfaction) and with probability $1$ the outcome has $\varphi$-value at least $\beta$ (i.e. almost-sure $\beta$ satisfaction). The sure-limit-sure problem asks if for all $\varepsilon > 0$ one can simultaneously ensure that all outcomes have $\varphi$-value at least $\alpha$ and with probability at least $1 - \varepsilon$ the outcome has $\varphi$-value at least $\beta$. Moreover, if simultaneous satisfaction of objectives is possible, then one would also like to construct a strategy (for sure-almost-sure) or a family of strategies (for sure-limit-sure) that achieves this. In this paper, we solve the sure-almost-sure and sure-limit-sure problems for window mean-payoff objectives. The window mean-payoff objective strengthens the standard mean-payoff objective by requiring that the average payoff of a finite window that slides over an infinite run be greater than a given threshold. We study two variants of window mean payoff: in the fixed variant, the window length $\ell$ is given, while in the bounded variant, the length is not given but is required to be bounded throughout the run. We show that the sure-almost-sure problem and the sure-limit-sure problem are both in P for the fixed variant (if $\ell$ is given in unary) and are both in NP $\cap$ coNP for the bounded variant, matching the computational complexity of sure satisfaction and almost-sure satisfaction when considered separately for these objectives. We also give bounds for the memory requirement of winning strategies for all considered problems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper solves the sure-almost-sure and sure-limit-sure problems for window mean-payoff objectives in finite MDPs with rational probabilities and payoffs. For the fixed variant (window length ℓ given in unary), both problems are in P; for the bounded variant, both are in NP ∩ coNP. These match the complexities of the separate sure and almost-sure satisfaction problems. The paper also supplies explicit memory bounds for the winning strategies (or families of strategies).

Significance. If the results hold, the work is significant for providing tight complexity classifications for combined quantitative objectives in MDPs. It demonstrates that the conjunction of sure and almost-sure (or limit-sure) requirements does not raise the complexity beyond the individual problems for window mean-payoff, which strengthens standard mean-payoff. The explicit memory bounds for strategies are a concrete strength that supports synthesis applications.

minor comments (3)

[Abstract] The abstract and introduction should explicitly reference the theorem numbers that establish the P and NP ∩ coNP memberships, so readers can directly locate the reductions or algorithms that preserve these classes.
[Preliminaries] A small illustrative example of a window mean-payoff run (showing the sliding window averages) would clarify the objective definition before the complexity results.
[Main results section] The memory bounds for strategies are stated but would benefit from a compact table or corollary that lists the exact size (e.g., polynomial in |M| and ℓ) for each of the four problem variants.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of our work and for recommending minor revision. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper establishes new complexity results (P for fixed unary window length; NP ∩ coNP for bounded) for combined sure-almost-sure and sure-limit-sure window mean-payoff objectives by reductions to standard MDP decision procedures for sure and almost-sure satisfaction separately. These reductions preserve the target complexity classes without introducing self-definitional loops, fitted parameters renamed as predictions, or load-bearing self-citations whose validity depends on the present work. All assumptions (finite MDPs, rational probabilities/payoffs, unary encoding of ℓ) are external and standard; no ansatz is smuggled via prior author work, and no known empirical pattern is merely renamed. The derivation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard definitions of finite MDPs, rational payoffs, and window mean-payoff objectives from prior literature without new free parameters or invented entities.

axioms (1)

domain assumption Markov decision processes are finite-state, finite-action with rational transition probabilities and payoffs.
Standard assumption enabling decidability and complexity analysis in the MDP literature.

pith-pipeline@v0.9.0 · 5663 in / 1162 out tokens · 43180 ms · 2026-05-13T04:31:42.490308+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear
We show that the sure-almost-sure problem and the sure-limit-sure problem are both in P for the fixed variant... matching the computational complexity of sure satisfaction and almost-sure satisfaction

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages

[1]

K. R. Apt and E. Gr \"a del. Lectures in Game Theory for Computer Scientists . Cambridge University Press, 2011. https://doi.org/10.1017/CBO9780511973468 doi:10.1017/CBO9780511973468

work page doi:10.1017/cbo9780511973468 2011
[2]

Baier and J - P

C. Baier and J - P. Katoen. Principles of model checking . MIT Press, 2008

work page 2008
[3]

Berthon, S

R. Berthon, S. Guha, and J. - F. Raskin. Mixing Probabilistic and non-Probabilistic Objectives in Markov Decision Processes . In LICS , pages 195--208. ACM , 2020. https://doi.org/10.1145/3373718.3394805 doi:10.1145/3373718.3394805

work page doi:10.1145/3373718.3394805 2020
[4]

Berthon, J - P

R. Berthon, J - P. Katoen, and T. Winkler. Markov Decision Processes with Sure Parity and Multiple Reachability Objectives . In RP , volume 15050 of Lecture Notes in Computer Science , pages 203--220. Springer, 2024. https://doi.org/10.1007/978-3-031-72621-7_14 doi:10.1007/978-3-031-72621-7_14

work page doi:10.1007/978-3-031-72621-7_14 2024
[5]

Berthon, M

R. Berthon, M. Randour, and J. - F. Raskin. Threshold Constraints with Guarantees for Parity Objectives in Markov Decision Processes . In ICALP , volume 80 of LIPIcs , pages 121:1--121:15, 2017. https://doi.org/10.4230/LIPIcs.ICALP.2017.121 doi:10.4230/LIPIcs.ICALP.2017.121

work page doi:10.4230/lipics.icalp.2017.121 2017
[6]

Billingsley

P. Billingsley. Probability and Measure . John Wiley and Sons, second edition, 1986

work page 1986
[7]

Bordais, S

B. Bordais, S. Guha, and J - F. Raskin. Expected Window Mean-Payoff . In FSTTCS , volume 150 of LIPIcs , pages 32:1--32:15, 2019. https://doi.org/10.4230/LIPIcs.FSTTCS.2019.32 doi:10.4230/LIPIcs.FSTTCS.2019.32

work page doi:10.4230/lipics.fsttcs.2019.32 2019
[8]

Brihaye, F

T. Brihaye, F. Delgrange, Y. Oualhadj, and M. Randour. Life is Random, Time is Not: Markov Decision Processes with Window Objectives . Logical Methods in Computer Science , Volume 16, Issue 4 , Dec 2020. https://doi.org/10.23638/LMCS-16(4:13)2020 doi:10.23638/LMCS-16(4:13)2020

work page doi:10.23638/lmcs-16(4:13)2020 2020
[9]

Bruy \` e re, E

V. Bruy \` e re, E. Filiot, M. Randour, and J-F. Raskin. Meet your expectations with guarantees: Beyond worst-case synthesis in quantitative games . Information and Computation , 254:259--295, 2017. https://doi.org/10.1016/j.ic.2016.10.011 doi:10.1016/j.ic.2016.10.011

work page doi:10.1016/j.ic.2016.10.011 2017
[10]

Chatterjee, L

K. Chatterjee, L. de Alfaro, and T. A. Henzinger. The Complexity of Stochastic Rabin and Streett Games . In ICALP , pages 878--890. Springer, 2005. https://doi.org/10.1007/11523468_71 doi:10.1007/11523468_71

work page doi:10.1007/11523468_71 2005
[11]

Chatterjee, L

K. Chatterjee, L. Doyen, M. Randour, and J-F. Raskin. Looking at mean-payoff and total-payoff through windows . Information and Computation , 242:25--52, 2015. https://doi.org/10.1016/j.ic.2015.03.010 doi:10.1016/j.ic.2015.03.010

work page doi:10.1016/j.ic.2015.03.010 2015
[12]

Chatterjee and M

K. Chatterjee and M. Henzinger. Efficient and Dynamic Algorithms for Alternating B \" u chi Games and Maximal End-Component Decomposition . J. ACM , 61(3):15:1--15:40, 2014. https://doi.org/10.1145/2597631 doi:10.1145/2597631

work page doi:10.1145/2597631 2014
[13]

Chatterjee, T

K. Chatterjee, T. A. Henzinger, and F. Horn. Stochastic Games with Finitary Objectives . In MFCS , pages 34--54. Springer, 2009. https://doi.org/10.1007/978-3-642-03816-7_4 doi:10.1007/978-3-642-03816-7_4

work page doi:10.1007/978-3-642-03816-7_4 2009
[14]

Chatterjee, Z

K. Chatterjee, Z. K r et\' nsk\' a , and J. K r et\' nsk\' y . Unifying Two Views on Multiple Mean-Payoff Objectives in Markov Decision Processes . Logical Methods in Computer Science , Volume 13, Issue 2, Jul 2017. https://doi.org/10.23638/LMCS-13(2:15)2017 doi:10.23638/LMCS-13(2:15)2017

work page doi:10.23638/lmcs-13(2:15)2017 2017
[15]

Chatterjee and N

K. Chatterjee and N. Piterman. Combinations of Qualitative Winning for Stochastic Parity Games . In CONCUR , volume 140 of LIPIcs , pages 6:1--6:17, 2019. https://doi.org/10.4230/LIPIcs.CONCUR.2019.6 doi:10.4230/LIPIcs.CONCUR.2019.6

work page doi:10.4230/lipics.concur.2019.6 2019
[16]

Clemente and J.-F

L. Clemente and J.-F. Raskin. Multidimensional beyond Worst-Case and Almost-Sure Problems for Mean-Payoff Objectives . In LICS , pages 257--268. IEEE, 2015. https://doi.org/10.1109/LICS.2015.33 doi:10.1109/LICS.2015.33

work page doi:10.1109/lics.2015.33 2015
[17]

Doyen, P

L. Doyen, P. Gaba, and S. Guha. Expectation in Stochastic Games with Prefix-Independent Objectives . In CONCUR , volume 348 of LIPIcs , pages 16:1--16:19, 2025. https://doi.org/10.4230/LIPIcs.CONCUR.2025.16 doi:10.4230/LIPIcs.CONCUR.2025.16

work page doi:10.4230/lipics.concur.2025.16 2025
[18]

Doyen, P

L. Doyen, P. Gaba, and S. Guha. Stochastic Window Mean-Payoff Games . Logical Methods in Computer Science , Volume 21, Issue 2, Jun 2025. https://doi.org/10.46298/lmcs-21(2:19)2025 doi:10.46298/lmcs-21(2:19)2025

work page doi:10.46298/lmcs-21(2:19)2025 2025
[19]

Doyen and S

L. Doyen and S. Guha. Algorithm and Strategy Construction for Sure-Almost-Sure Stochastic Parity Games . In STACS , pages 34:1--34:20, 2026. https://doi.org/10.4230/LIPICS.STACS.2026.34 doi:10.4230/LIPICS.STACS.2026.34

work page doi:10.4230/lipics.stacs.2026.34 2026
[20]

Ehrenfeucht and J

A. Ehrenfeucht and J. Mycielski. Positional Strategies for Mean Payoff Games . Int. Journal of Game Theory , 8(2):109--113, 1979. https://doi.org/10.1007/BF01768705 doi:10.1007/BF01768705

work page doi:10.1007/bf01768705 1979
[21]

Gaba and S

P. Gaba and S. Guha. Optimising Expectation with Guarantees for Window Mean Payoff in Markov Decision Processes . In AAMAS , pages 820--828, 2025. https://doi.org/10.5555/3709347.3743600 doi:10.5555/3709347.3743600

work page doi:10.5555/3709347.3743600 2025
[22]

Gr \" a del, W

E. Gr \" a del, W. Thomas, and T. Wilke, editors. Automata, Logics, and Infinite Games: A Guide to Current Research , volume 2500 of Lecture Notes in Computer Science . Springer, 2002. https://doi.org/10.1007/3-540-36387-4 doi:10.1007/3-540-36387-4

work page doi:10.1007/3-540-36387-4 2002
[23]

D. A. Martin. The Determinacy of Blackwell Games . The Journal of Symbolic Logic , 63(4):1565--1581, 1998. https://doi.org/10.2307/2586667 doi:10.2307/2586667

work page doi:10.2307/2586667 1998
[24]

M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming . John Wiley and Sons, 1994

work page 1994
[25]

Zwick and M

U. Zwick and M. Paterson. The Complexity of Mean Payoff Games on Graphs . Theoretical Computer Science , 158(1 & 2):343--359, 1996. https://doi.org/10.1016/0304-3975(95)00188-3 doi:10.1016/0304-3975(95)00188-3

work page doi:10.1016/0304-3975(95)00188-3 1996