pith. sign in

arxiv: 2606.23229 · v1 · pith:EJO5M2OWnew · submitted 2026-06-22 · 🧮 math.OC · cs.SY· eess.SY

Value iteration with stopping criterion: finite iterations, stability, and near-optimality guarantees

Pith reviewed 2026-06-26 07:42 UTC · model grok-4.3

classification 🧮 math.OC cs.SYeess.SY
keywords value iterationstopping criterionstability guaranteesnear-optimality boundsdynamic programmingdiscrete-time systemsinfinite horizon
0
0 comments X

The pith

Value iteration with a generalized stopping criterion terminates finitely and yields stabilizing near-optimal policies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper analyzes value iteration for deterministic discrete-time systems with infinite-horizon costs. It equips the algorithm with a flexible stopping criterion and proves that the process ends after a finite number of iterations under mild assumptions. The design of the stopping rule determines whether the final policies stabilize the system and how close the achieved cost is to the optimal one, with explicit bounds provided.

Core claim

Under mild assumptions, value iteration terminates in a finite number of iterations. By properly designing the generalized stopping criterion, the policies at the final iteration stabilize the closed-loop system, and explicit near-optimality bounds for the policies and value functions are derived that depend on the stopping choice.

What carries the argument

Generalized stopping criterion for value iteration, which stops when the difference between successive value functions or related quantities meets a threshold, enabling finite termination, stability, and performance analysis.

Load-bearing premise

The plant is a deterministic discrete-time system with infinite-horizon costs and the algorithm generates the inputs via value iteration.

What would settle it

Apply value iteration to a known unstable or non-terminating deterministic system and check if the stopping criterion fails to produce a stabilizing policy or if termination does not occur.

read the original abstract

Value iteration (VI) is a cornerstone of dynamic programming that allows computing near-optimal feedback laws for general plant dynamics and cost functions. In practice, however, it must be stopped after finitely many iterations. This raises the question of when to stop the algorithm so that the resulting policies and value functions achieve desirable properties, like given near-optimality bounds and stability. In this context, we study deterministic, discrete-time systems with infinite-horizon (possibly discounted) costs whose inputs are generated by VI. We equip VI with a generalized stopping criterion that encompasses existing choices while allowing new ones. Our aim is to analyze the properties of the policies and value functions at the final iteration. Under mild assumptions, we first show that VI indeed terminates in a finite number of iterations. We then establish that the final policies are stabilizing by properly designing the stopping criterion, and derive explicit near-optimality bounds characterized by this choice. These results offer a design framework for the stopping criteria that balances computational effort with stability and performance guarantees.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper equips value iteration with a generalized stopping criterion for deterministic discrete-time infinite-horizon (possibly discounted) optimal control problems. It proves finite termination under mild assumptions, shows that the terminal policy is stabilizing when the stopping rule is suitably designed, and derives explicit near-optimality bounds on the final value function and policy that are parameterized by the chosen threshold.

Significance. If the claims hold, the work supplies a practical design framework that lets practitioners select a stopping criterion to balance computational cost against explicit stability and performance guarantees. The results rest on standard monotonicity and contraction properties of the Bellman operator together with the existence of a stabilizing policy, without introducing new technical machinery.

minor comments (2)
  1. [§3] §3: the generalized stopping criterion is introduced via an abstract threshold condition; an explicit comparison (e.g., a short table) with the classical residual and span criteria would clarify the scope of the generalization.
  2. [§2] The statement of the mild assumptions (existence of a stabilizing policy, bounded costs, etc.) appears in §2; moving a concise bullet list of these assumptions to the abstract or introduction would improve readability for control-oriented readers.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary, recognition of the paper's significance, and recommendation to accept the manuscript. We appreciate the assessment that the results rest on standard properties without new technical machinery and provide a practical design framework.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper derives finite termination of value iteration, stability of the terminal policy, and explicit near-optimality bounds via standard dynamic-programming arguments (Bellman operator monotonicity, existence of a stabilizing policy, and a threshold-based stopping rule). These steps rely on the stated mild assumptions for deterministic discrete-time plants and do not reduce to fitted quantities, self-definitional constructs, or load-bearing self-citations. The central claims are self-contained proofs that follow directly from the problem setup without circular reduction to inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities beyond the generic 'mild assumptions' on the system class; ledger therefore empty pending full text.

pith-pipeline@v0.9.1-grok · 5735 in / 985 out tokens · 15932 ms · 2026-06-26T07:42:41.315168+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references

  1. [1]

    Bertsekas,Dynamic Programming and Optimal Control, Volume I, vol

    D. Bertsekas,Dynamic Programming and Optimal Control, Volume I, vol. 4. Athena Scientific, Belmont, U.S.A., 2012

  2. [2]

    Sutton and A

    R. Sutton and A. Barto,Reinforcement Learning: An Introduction. MIT Press, Cambridge, U.S.A., 1998

  3. [3]

    D. P. Bertsekas,Reinforcement Learning and Optimal Control, vol. 1. Athena Scientific, Belmont, U.S.A., 2019

  4. [4]

    Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers,

    F. L. Lewis, D. Vrabie, and K. G. Vamvoudakis, “Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers,”IEEE Control Systems Magazine, vol. 32, no. 6, pp. 76–105, 2012

  5. [5]

    M. L. Puterman,Markov Decision Processes: Discrete Stochastic Dy- namic Programming. John Wiley & Sons, Hoboken, U.S.A., 2014

  6. [6]

    On an iterative technique for riccati equation computa- tions,

    D. Kleinman, “On an iterative technique for riccati equation computa- tions,”IEEE Trans. on Aut. Control, vol. 13, no. 1, pp. 114–115, 1968

  7. [7]

    Adaptive dynamic programming and optimal control of nonlinear nonaffine systems,

    T. Bian, Y . Jiang, and Z.-P. Jiang, “Adaptive dynamic programming and optimal control of nonlinear nonaffine systems,”Automatica, vol. 50, no. 10, pp. 2624–2632, 2014

  8. [8]

    Computing stabilizing linear controllers via policy iteration,

    A. Lamperski, “Computing stabilizing linear controllers via policy iteration,” inIEEE Conference on Decision and Control, Jeju Island, South Korea, pp. 1902–1907, 2020. 16

  9. [9]

    Early termination of NMPC interior point solvers: Relating the duality gap to stability,

    A. Pavlov, I. Shames, and C. Manzie, “Early termination of NMPC interior point solvers: Relating the duality gap to stability,” inEuropean Control Conference, Naples, Italy, pp. 805–810, 2019

  10. [10]

    Discrete-time non- linear HJB solution using approximate dynamic programming: Conver- gence proof,

    A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, “Discrete-time non- linear HJB solution using approximate dynamic programming: Conver- gence proof,”IEEE Transactions on Systems, Man, and Cybernetics, Part B, vol. 38, no. 4, pp. 943–949, 2008

  11. [11]

    Revisiting approximate dynamic programming and its convergence,

    A. Heydari, “Revisiting approximate dynamic programming and its convergence,”IEEE Transactions on Cybernetics, vol. 44, no. 12, pp. 2733–2743, 2014

  12. [12]

    Value iteration adaptive dynamic pro- gramming for optimal control of discrete-time nonlinear systems,

    Q. Wei, D. Liu, and H. Lin, “Value iteration adaptive dynamic pro- gramming for optimal control of discrete-time nonlinear systems,”IEEE Transactions on Cybernetics, vol. 46, no. 3, pp. 840–853, 2015

  13. [13]

    Stopping criteria for value iteration on stochastic games with quantitative objectives,

    J. K ˇret´ınsk`y, T. Meggendorfer, and M. Weininger, “Stopping criteria for value iteration on stochastic games with quantitative objectives,” in Annual ACM/IEEE Symposium on Logic in Computer Science, Boston, U.S.A., pp. 1–14, 2023

  14. [14]

    Value iteration and adaptive dynamic pro- gramming for data-driven adaptive optimal control design,

    T. Bian and Z.-P. Jiang, “Value iteration and adaptive dynamic pro- gramming for data-driven adaptive optimal control design,”Automatica, vol. 71, pp. 348–360, 2016

  15. [15]

    Value and policy iterations in optimal control and adaptive dynamic programming,

    D. P. Bertsekas, “Value and policy iterations in optimal control and adaptive dynamic programming,”IEEE Transactions on Neural networks and Learning Systems, vol. 28, no. 3, pp. 500–509, 2015

  16. [16]

    Model predictive control: For want of a local control Lyapunov function, all is not lost,

    G. Grimm, M. Messina, S. Tuna, and A. Teel, “Model predictive control: For want of a local control Lyapunov function, all is not lost,”IEEE Transactions on Automatic Control, vol. 50, no. 5, pp. 546–558, 2005

  17. [17]

    Stability analysis of discrete-time infinite-horizon optimal control with discounted cost,

    R. Postoyan, L. Bus ¸oniu, D. Ne ˇsi´c, and J. Daafouz, “Stability analysis of discrete-time infinite-horizon optimal control with discounted cost,” IEEE Trans. on Aut. Control, vol. 62, no. 6, pp. 2736–2749, 2017

  18. [18]

    Finite-horizon discounted optimal control: stability and performance,

    M. Granzotto, R. Postoyan, L. Bus ¸oniu, D. Ne ˇsi´c, and J. Daafouz, “Finite-horizon discounted optimal control: stability and performance,” IEEE Trans. on Aut. Control, vol. 66, no. 2, pp. 550–565, 2020

  19. [19]

    NMPC without terminal constraints,

    L. Gr ¨une, “NMPC without terminal constraints,”IFAC Proceedings Volumes, vol. 45, no. 17, pp. 1–13, 2012

  20. [20]

    Sta- bilization of strictly dissipative discrete time systems with discounted optimal control,

    V . Gaitsgory, L. Gr¨une, M. H ¨oger, C. M. Kellett, and S. R. Weller, “Sta- bilization of strictly dissipative discrete time systems with discounted optimal control,”Automatica, vol. 93, pp. 311–320, 2018

  21. [21]

    B. D. Anderson and J. B. Moore,Optimal Control: Linear Quadratic Methods. Prentice-Hall, Inc., Englewood Cliffs, U.S.A., 1971

  22. [22]

    Value and policy iterations in optimal control and adap- tive dynamic programming,

    D. Bertsekas, “Value and policy iterations in optimal control and adap- tive dynamic programming,”IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 3, pp. 500–509, 2015

  23. [23]

    When to stop value iteration: stability and near-optimality versus computation,

    M. Granzotto, R. Postoyan, D. Ne ˇsi´c, L. Bus ¸oniu, and J. Daafouz, “When to stop value iteration: stability and near-optimality versus computation,” in3rd Conference on Learning for Dynamics and Control, pp. 412–424, PMLR, 2021

  24. [24]

    Goebel, R

    R. Goebel, R. Sanfelice, and A. Teel,Hybrid Dynamical Systems. Princeton University Press, Princeton, U.S.A., 2012

  25. [25]

    Dynamical properties of hybrid systems simulators,

    R. Sanfelice and A. Teel, “Dynamical properties of hybrid systems simulators,”Automatica, vol. 46, no. 2, pp. 239–248, 2010

  26. [26]

    Robust stability and near-optimality for policy iteration: for want of recursive feasibility, all is not lost,

    M. Granzotto, O. L. De Silva, R. Postoyan, D. Ne ˇsi´c, and Z.-P. Jiang, “Robust stability and near-optimality for policy iteration: for want of recursive feasibility, all is not lost,”IEEE Transactions on Automatic Control, vol. 69, no. 12, pp. 8247–8262, 2024

  27. [27]

    Survey constrained model predictive control: Stability and optimality,

    D. Mayne, J. Rawlings, C. Rao, and P. Scokaert, “Survey constrained model predictive control: Stability and optimality,”Automatica, vol. 36, no. 6, pp. 789–814, 2000

  28. [28]

    Rockafellar and R.-B

    R. Rockafellar and R.-B. Wets,Variational Analysis, vol. 317. Springer, Dordrecht, Germany, 3rd ed., 1998

  29. [29]

    An existence theorem for discrete-time infinite-horizon optimal control problems,

    S. Keerthi and E. Gilbert, “An existence theorem for discrete-time infinite-horizon optimal control problems,”IEEE Transactions on Au- tomatic Control, vol. 30, no. 9, pp. 907–909, 1985

  30. [30]

    Rollout algorithms for discrete optimization: A survey,

    D. P. Bertsekas, “Rollout algorithms for discrete optimization: A survey,” inHandbook of Combinatorial Optimization, vol. 5, pp. 2989–3013, Springer, New York, U.S.A., 2013

  31. [31]

    Model predictive control and reinforcement learn- ing: A unified framework based on dynamic programming,

    D. P. Bertsekas, “Model predictive control and reinforcement learn- ing: A unified framework based on dynamic programming,”IFAC- PapersOnLine, vol. 58, no. 18, pp. 363–383, 2024

  32. [32]

    On the infinite horizon performance of receding horizon controllers,

    L. Grune and A. Rantzer, “On the infinite horizon performance of receding horizon controllers,”IEEE Transactions on Automatic Control, vol. 53, no. 9, pp. 2100–2111, 2008

  33. [33]

    Stability analysis of discrete-time finite-horizon discounted optimal control,

    M. Granzotto, R. Postoyan, L. Bus ¸oniu, D. Ne ˇsi´c, and J. Daafouz, “Stability analysis of discrete-time finite-horizon discounted optimal control,” inIEEE Conf. on Dec. and Control, Miami, U.S.A., 2018

  34. [34]

    Stability and recurrence of nonlinear stochastic systems with optimal control,

    R. Moldenhauer, D. Ne ˇsi´c, M. Granzotto, R. Postoyan, and A. Teel, “Stability and recurrence of nonlinear stochastic systems with optimal control,”Submitted for journal publication, 2026