Mean-Payoff-Parity and Lifting Strategies from MDPs to 2-Player Stochastic Games
Pith reviewed 2026-06-26 18:26 UTC · model grok-4.3
The pith
In 2-player stochastic games, optimal randomized strategies for mean-payoff-parity require linear memory equal to the number of even colors.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For the mean-payoff-parity objective, which is shift-invariant inverse-submixing, optimal randomized strategies in 2-player stochastic games require and suffice with linear memory equal to the number of even colors. In MDPs, memoryless randomized strategies are optimal while deterministic ones need exponential memory. General lifting of memoryless strategies from MDPs to games demands an exponential increase in memory modes even for randomized strategies. An alternative lifting construction cannot be extended to randomized strategies, as shown by an objective where both players have memoryless randomized optima in MDPs but Maximizer needs infinite memory in deterministic 2-player games.
What carries the argument
The lifting construction from MDPs to 2-player stochastic games for shift-invariant inverse-submixing objectives, together with a specialized linear-memory construction for mean-payoff-parity that uses one mode per even color.
If this is right
- Lifting optimal memoryless strategies from MDPs to 2-player stochastic games requires exponentially many memory modes in general, even when randomization is permitted.
- For mean-payoff-parity, linear memory in the number of even colors is both necessary and sufficient for optimal randomized strategies.
- In MDPs, optimal deterministic strategies for mean-payoff-parity require exponential memory while randomized ones remain memoryless.
- The alternative lifting construction for deterministic strategies does not extend to randomized strategies and can force infinite memory in deterministic 2-player games.
Where Pith is reading between the lines
- Solution algorithms for mean-payoff-parity games in the stochastic setting can track memory states linear in the number of even colors.
- The linear bound may apply to other parity-based objectives that share the inverse-submixing property.
- The separation between deterministic and randomized memory requirements could guide the design of hybrid strategy synthesis methods.
Load-bearing premise
The objectives must satisfy the shift-invariant and inverse-submixing properties so that the lifting from MDPs applies and the memory bounds hold.
What would settle it
A concrete 2-player stochastic game with mean-payoff-parity objective in which every optimal randomized strategy for Maximizer requires strictly more than one memory mode per even color.
read the original abstract
We consider the strategy complexity (i.e., memory and randomization) of optimal strategies in turn-based 2-player zero-sum stochastic games. Results in [Gimbert,Kelmendi:2023] show how to lift optimal memoryless strategies for shift-invariant inverse-submixing objectives from MDPs to 2-player stochastic games with an exponential increase in the number of memory modes. We show the corresponding lower bound, i.e., the extra exponential memory is required in general, even for randomized strategies. Moreover, we solve the strategy complexity of the well-studied mean-payoff-parity objective in 2-player stochastic games. This objective is also shift-invariant inverse-submixing, but easier than the worst case for this class. In MDPs, Maximizer has optimal memoryless randomized strategies, while optimal deterministic strategies require exponential memory. However, in stochastic games, optimal randomized strategies require, at least and at most, linear memory (equal to the number of even colors). Finally, we show that a different construction for lifting memoryless (resp. finite-memory) deterministic strategies from MDPs (resp. 1-player games) to 2-player games cannot be generalized even to memoryless randomized strategies. We construct a shift-invariant objective where Max and Min each have optimal memoryless randomized strategies in all MDPs, but optimal (randomized) Max strategies still require infinite memory in deterministic 2-player games.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript establishes that for shift-invariant inverse-submixing objectives, optimal strategies cannot in general be lifted from MDPs to 2-player stochastic games without an exponential blow-up in memory, even when randomization is allowed. For the mean-payoff-parity objective it gives matching upper and lower bounds showing that optimal randomized strategies in stochastic games require (and suffice with) linear memory equal to the number of even colors. It further exhibits a shift-invariant objective for which memoryless randomized optimality holds in all MDPs yet infinite memory is required in deterministic 2-player games, showing that an alternative lifting construction does not extend even to the randomized setting.
Significance. The work supplies the first explicit exponential lower bound matching the known lifting upper bound for the broad class of shift-invariant inverse-submixing objectives, together with a tight linear-memory characterization for mean-payoff parity. These results clarify the precise memory-randomization trade-offs that arise when moving from one-player to two-player stochastic games and delimit the scope of existing lifting techniques.
major comments (2)
- [§4] §4 (lower-bound construction for general shift-invariant inverse-submixing objectives): the reduction must be checked to confirm that the constructed objective remains inverse-submixing while forcing the exponential memory lower bound for randomized strategies; if the submixing property is preserved only by a non-uniform choice of parameters, the claim that the lower bound applies to the entire class would need adjustment.
- [§5] §5 (mean-payoff-parity linear-memory result): the lower-bound argument relies on a family of games whose even-color count grows linearly; it should be verified that the same linear bound continues to hold when the parity condition is combined with the mean-payoff condition under the same shift-invariant inverse-submixing hypothesis used for the general lifting.
minor comments (2)
- [§5] Notation for the number of even colors is introduced without an explicit equation reference; adding a displayed definition would improve readability.
- [§6] The statement that the alternative lifting construction 'cannot be generalized' would benefit from a one-sentence pointer to the precise property (shift-invariance or inverse-submixing) that fails in the counter-example.
Simulated Author's Rebuttal
We thank the referee for the positive evaluation and constructive comments. Below we address each major point, confirming that the constructions in the manuscript satisfy the required properties uniformly.
read point-by-point responses
-
Referee: [§4] §4 (lower-bound construction for general shift-invariant inverse-submixing objectives): the reduction must be checked to confirm that the constructed objective remains inverse-submixing while forcing the exponential memory lower bound for randomized strategies; if the submixing property is preserved only by a non-uniform choice of parameters, the claim that the lower bound applies to the entire class would need adjustment.
Authors: We have re-verified the §4 construction. The objective is defined uniformly for the entire family of games, and the inverse-submixing property holds with fixed parameters (independent of game size or the exponential parameter k). The submixing constants do not vary with the instance, so the exponential memory lower bound for randomized strategies applies directly to the full class of shift-invariant inverse-submixing objectives. No adjustment to the claim is required. revision: no
-
Referee: [§5] §5 (mean-payoff-parity linear-memory result): the lower-bound argument relies on a family of games whose even-color count grows linearly; it should be verified that the same linear bound continues to hold when the parity condition is combined with the mean-payoff condition under the same shift-invariant inverse-submixing hypothesis used for the general lifting.
Authors: The mean-payoff-parity objective is constructed to be shift-invariant and inverse-submixing with the same uniform parameters used in the general case. The lower-bound family explicitly combines the two conditions while preserving these properties, and the necessity of linear memory (equal to the number of even colors) is shown directly under this hypothesis. The upper bound likewise holds within the class. The argument is already complete for the combined objective; no change is needed. revision: no
Circularity Check
No significant circularity
full rationale
The paper derives its main claims—the exponential lower bound on memory for general shift-invariant inverse-submixing objectives and the matching linear-memory characterization for mean-payoff-parity—via independent constructions and explicit upper/lower-bound arguments that do not reduce to the inputs of the cited Gimbert-Kelmendi 2023 lifting result. The objective class membership is used only to invoke an external theorem; the specialized linear-memory result and the negative result on non-generalizability are presented as new, self-contained arguments with no self-definitional, fitted-prediction, or load-bearing self-citation steps.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Turn-based 2-player zero-sum stochastic games with perfect information
- domain assumption Objectives are shift-invariant inverse-submixing
Reference graph
Works this paper leans on
-
[1]
, author =
Alternating Tree Automata, Parity Games, and Modal mu-Calculus. , author =. Bulletin of the Belgian Mathematical Society Simon Stevin , volume =. 2001 , publisher =
2001
-
[2]
, booktitle = icalp, pages =
Chatterjee, Krishnendu and De Alfaro, Luca and Henzinger, Thomas A. , booktitle = icalp, pages =. The complexity of stochastic
-
[3]
Subexponential lower bounds for randomized pivoting rules for the simplex algorithm , author =
-
[4]
Exponential lower bounds for policy iteration , author =
-
[5]
An exponential lower bound for the parity game strategy improvement algorithm as we know it , author =
-
[6]
1972 , publisher =
Automata on infinite objects and Church's problem , author =. 1972 , publisher =
1972
-
[7]
SIAM journal on control and optimization , volume =
Supervisory control of a class of discrete event processes , author =. SIAM journal on control and optimization , volume =. 1987 , publisher =
1987
-
[8]
On the synthesis of a reactive module , author =
-
[9]
1989 , publisher =
Trace theory for automatic hierarchical verification of speed-independent circuits , author =. 1989 , publisher =
1989
-
[10]
ACM SIGSOFT Software Engineering Notes , volume =
Interface automata , author =. ACM SIGSOFT Software Engineering Notes , volume =. 2001 , publisher =
2001
-
[11]
2002 , publisher =
Alternating-time temporal logic , author =. 2002 , publisher =
2002
-
[12]
, author =
Undecidable problems for probabilistic automata of fixed dimension. , author =. Theory of Computing systems , volume =. 2003 , publisher =
2003
-
[13]
Fun with FireWire: A comparative study of formal verification methods applied to the
Stoelinga, Mari. Fun with FireWire: A comparative study of formal verification methods applied to the. Formal aspects of computing , volume =. 2003 , publisher =
2003
-
[14]
Verification of the randomized consensus algorithm of
Pogosyants, Anna and Segala, Roberto and Lynch, Nancy , journal =. Verification of the randomized consensus algorithm of. 2000 , publisher =
2000
-
[15]
Proceedings of the national academy of sciences , volume =
Stochastic games , author =. Proceedings of the national academy of sciences , volume =. 1953 , publisher =
1953
-
[16]
Contributions to the Theory of Games , volume =
Stochastic games with zero stop probabilities , author =. Contributions to the Theory of Games , volume =. 1957 , publisher =
1957
-
[17]
Deciding the winner in parity games is in
Jurdzi. Deciding the winner in parity games is in. Information Processing Letters , volume =. 1998 , publisher =
1998
-
[18]
Henzinger and Nir Piterman , title =
Krishnendu Chatterjee and Thomas A. Henzinger and Nir Piterman , title =. 2007 , doi =
2007
-
[19]
A pseudo-quasi-polynomial algorithm for mean-payoff parity games , author =
-
[20]
1998 , publisher =
Theory of linear and integer programming , author =. 1998 , publisher =
1998
-
[21]
On the Complexity of Resource-Bounded Logics
Alechina, Natasha and Bulling, Nils and Demri, Stephane and Logan, Brian. On the Complexity of Resource-Bounded Logics
-
[22]
Parosh Aziz Abdulla and Radu Ciobanu and Richard Mayr and Arnaud Sangnier and Jeremy Sproston , title =
-
[23]
Xavier Allamigeon and Stéphane Gaubert and Ricardo D. Katz. Tropical polar cones, hypergraph transversals, and mean payoff games. 2011
2011
-
[24]
2013 , volume =
Solving Parity Games on Integer Vectors , author =. 2013 , volume =
2013
-
[25]
Lubos Brim and Jakub Chaloupka , title =
-
[26]
and Chaloupka, J
Brim, L. and Chaloupka, J. and Doyen, L. and Gentilini, R. and Raskin, J. F. Faster algorithms for mean-payoff games. Formal Methods in System Design. 2011
2011
-
[27]
Better quality in synthesis through quantitative objectives , author =
-
[28]
Blondin and A
M. Blondin and A. Finkel and S. G. Reachability in Two-Dimensional Vector Addition Systems with States Is PSPACE-Complete , year =
-
[29]
Infinite Runs in Weighted Timed Automata with Energy Constraints , author =
-
[30]
Efficient Analysis of Probabilistic Programs with an Unbounded Counter , journal = jacm, volume =
Tom. Efficient Analysis of Probabilistic Programs with an Unbounded Counter , journal = jacm, volume =
-
[31]
T. Br\'. Optimizing the Expected Mean Payoff in Energy. 2016 , pages =
2016
-
[32]
2008 , volume =
Noam Berger and Nevin Kapur and Leonard Schulman and Vijay Vazirani , title =. 2008 , volume =
2008
-
[33]
Bezem, Marc and Nieuwenhuis, Robert and Rodr \'i guez-Carbonell, Enric. 2008
2008
-
[34]
Discrete Applied Mathematics
A combinatorial strongly subexponential strategy improvement algorithm for mean payoff games. Discrete Applied Mathematics. 2007
2007
-
[35]
, author =
On Algorithms for Simple Stochastic Games. , author =. Advances in computational complexity theory , pages =
-
[36]
International Workshop on Embedded Software , series =
Resource interfaces , author =. International Workshop on Embedded Software , series =
-
[37]
International Workshop on Embedded Software , pages =
Resource interfaces , author =. International Workshop on Embedded Software , pages =
-
[38]
Energy Parity Games , booktitle = icalp, year =
-
[39]
Energy and Mean-Payoff Parity
Krishnendu Chatterjee and Laurent Doyen , booktitle = mfcs, year =. Energy and Mean-Payoff Parity
-
[40]
2012 , pages =
Energy parity games , author =. 2012 , pages =
2012
-
[41]
2015 , pages =
Randomness for free , author =. 2015 , pages =
2015
-
[42]
Mean-payoff parity games , author =
-
[43]
Quantitative Stochastic Parity Games , booktitle = soda, year =
Chatterjee, Krishnendu and Jurdzi\'. Quantitative Stochastic Parity Games , booktitle = soda, year =
-
[44]
and Jain, Sanjay and Khoussainov, Bakhadyr and Li, Wei and Stephan, Frank , title =
Calude, Cristian S. and Jain, Sanjay and Khoussainov, Bakhadyr and Li, Wei and Stephan, Frank , title =. 2017 , pages =
2017
-
[45]
Thomas Colcombet and Marcin Jurdzi
-
[46]
CoRR , volume =
Carlo Comin and Romeo Rizzi , title =. CoRR , volume =
-
[47]
Improved Pseudo-polynomial Bound for the Value Problem and Optimal Strategy Synthesis in Mean Payoff Games
Comin, Carlo and Rizzi, Romeo. Improved Pseudo-polynomial Bound for the Value Problem and Optimal Strategy Synthesis in Mean Payoff Games. Algorithmica. 2017
2017
-
[48]
1997 , school =
Formal verification of probabilistic systems , author =. 1997 , school =
1997
-
[49]
Random walks, Brownian motion, and interacting particle systems , pages =
Making money from fair games , author =. Random walks, Brownian motion, and interacting particle systems , pages =
-
[50]
Emerson, E. A. and Jutla, C. S. , title =. 1991 , pages =
1991
-
[51]
and Mycielski, J
Ehrenfeucht, A. and Mycielski, J. Positional strategies for mean payoff games. 1979
1979
-
[52]
Quasi-birth--death processes, tree-like
Etessami, Kousha and Wojtczak, Dominik and Yannakakis, Mihalis , journal =. Quasi-birth--death processes, tree-like
-
[53]
and Lange, M
Friedmann, O. and Lange, M. , title =
-
[54]
Gurvich and A.V
V.A. Gurvich and A.V. Karzanov and L.G. Khachivan. Cyclic games and an algorithm to find minimax cycle means in directed graphs. USSR Computational Mathematics and Mathematical Physics. 1988
1988
-
[55]
Katz and Sergeĭ Sergeev
Stéphane Gaubert and Ricardo D. Katz and Sergeĭ Sergeev. Tropical linear-fractional programming and parametric mean payoff games. 2012
2012
-
[56]
Ernst Moritz Hahn and Sven Schewe and Andrea Turrini and Lijun Zhang , title =
-
[57]
Number of quantifiers is better than number of tape cells , author =
-
[58]
Jurdzi\'. Inf. Process. Lett. , year =
-
[59]
Succinct progress measures for solving parity games , booktitle = lics, year =
Marcin Jurdzi. Succinct progress measures for solving parity games , booktitle = lics, year =
-
[60]
Fixed-Dimensional Energy Games are in Pseudo-Polynomial Time
Jurdzi \' n ski, Marcin and Lazi \' c , Ranko and Schmitz, Sylvain. Fixed-Dimensional Energy Games are in Pseudo-Polynomial Time
-
[61]
Karmarkar , title =
N. Karmarkar , title =
-
[62]
R. D. Katz , journal = ieeeac, title =. 2007 , volume =
2007
-
[63]
Alexander Kaiser and Daniel Kroening and Thomas Wahl , title =
-
[64]
Stefan Kiefer and Richard Mayr and Mahsa Shirmohammadi and Dominik Wojtczak , title =
-
[65]
2008 , isbn =
Kroening, Daniel and Strichman, Ofer , title =. 2008 , isbn =
2008
-
[66]
T. A. Liggett and S. A. Lippman , title =. Siam Review , volume =
-
[67]
Lifshits, Y. M. and Pavlov, D. S. Potential theory for mean payoff games. Journal of Mathematical Sciences. 2007
2007
-
[68]
1999 , isbn =
Milner, Robin , title =. 1999 , isbn =
1999
-
[69]
and Luttenberger, Michael
Meyer, Philipp J. and Luttenberger, Michael. Solving Mean-Payoff Games on the GPU
-
[70]
Puterman, M. L. , title =. 1994 , isbn =
1994
-
[71]
N. N. Pisaruk , title =. Mathematics of Operations Research , volume =
-
[72]
and Rosner, R
Pnueli, A. and Rosner, R. , title =. 1989 , pages =
1989
-
[73]
CoRR , year =
Extending finite memory determinacy: General techniques and an application to energy parity games , author =. CoRR , year =
-
[74]
Logic Journal of the IGPL , volume =
Stirling, Colin , title =. Logic Journal of the IGPL , volume =. 1999 , eprint =
1999
-
[75]
Henzinger and Alexander Rabinovich and Jean-François Raskin
Yaron Velner and Krishnendu Chatterjee and Laurent Doyen and Thomas A. Henzinger and Alexander Rabinovich and Jean-François Raskin. The complexity of multi-mean-payoff and multi-energy games. 2015
2015
-
[76]
The complexity of mean payoff games on graphs , author =
-
[77]
Infinite-State Energy Games , author =
-
[78]
Richard Mayr and Sven Schewe and Patrick Totzke and Dominik Wojtczak , title =
-
[79]
Proc.\ of Fossacs , series =
Richard Mayr and Sven Schewe and Patrick Totzke and Dominik Wojtczak , title =. Proc.\ of Fossacs , series =
-
[80]
2008 , publisher=
Probability and measure , author=. 2008 , publisher=
2008
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.