Value iteration with stopping criterion: finite iterations, stability, and near-optimality guarantees

Dragan Ne\v{s}i\'c; Jamal Daafouz; Lucian Bu\c{s}oniu; Mathieu Granzotto; Romain Postoyan

arxiv: 2606.23229 · v1 · pith:EJO5M2OWnew · submitted 2026-06-22 · 🧮 math.OC · cs.SY· eess.SY

Value iteration with stopping criterion: finite iterations, stability, and near-optimality guarantees

Mathieu Granzotto , Romain Postoyan , Dragan Ne\v{s}i\'c , Lucian Bu\c{s}oniu , Jamal Daafouz This is my paper

Pith reviewed 2026-06-26 07:42 UTC · model grok-4.3

classification 🧮 math.OC cs.SYeess.SY

keywords value iterationstopping criterionstability guaranteesnear-optimality boundsdynamic programmingdiscrete-time systemsinfinite horizon

0 comments

The pith

Value iteration with a generalized stopping criterion terminates finitely and yields stabilizing near-optimal policies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper analyzes value iteration for deterministic discrete-time systems with infinite-horizon costs. It equips the algorithm with a flexible stopping criterion and proves that the process ends after a finite number of iterations under mild assumptions. The design of the stopping rule determines whether the final policies stabilize the system and how close the achieved cost is to the optimal one, with explicit bounds provided.

Core claim

Under mild assumptions, value iteration terminates in a finite number of iterations. By properly designing the generalized stopping criterion, the policies at the final iteration stabilize the closed-loop system, and explicit near-optimality bounds for the policies and value functions are derived that depend on the stopping choice.

What carries the argument

Generalized stopping criterion for value iteration, which stops when the difference between successive value functions or related quantities meets a threshold, enabling finite termination, stability, and performance analysis.

Load-bearing premise

The plant is a deterministic discrete-time system with infinite-horizon costs and the algorithm generates the inputs via value iteration.

What would settle it

Apply value iteration to a known unstable or non-terminating deterministic system and check if the stopping criterion fails to produce a stabilizing policy or if termination does not occur.

read the original abstract

Value iteration (VI) is a cornerstone of dynamic programming that allows computing near-optimal feedback laws for general plant dynamics and cost functions. In practice, however, it must be stopped after finitely many iterations. This raises the question of when to stop the algorithm so that the resulting policies and value functions achieve desirable properties, like given near-optimality bounds and stability. In this context, we study deterministic, discrete-time systems with infinite-horizon (possibly discounted) costs whose inputs are generated by VI. We equip VI with a generalized stopping criterion that encompasses existing choices while allowing new ones. Our aim is to analyze the properties of the policies and value functions at the final iteration. Under mild assumptions, we first show that VI indeed terminates in a finite number of iterations. We then establish that the final policies are stabilizing by properly designing the stopping criterion, and derive explicit near-optimality bounds characterized by this choice. These results offer a design framework for the stopping criteria that balances computational effort with stability and performance guarantees.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a usable design framework for stopping value iteration that ties termination, stability, and explicit bounds together under standard assumptions.

read the letter

The core contribution is a generalized stopping rule for value iteration on deterministic discrete-time systems. It covers common existing thresholds and supports new ones, then shows that the algorithm stops after finitely many steps, that the final policy stabilizes the plant when the threshold is chosen appropriately, and that the value function satisfies an explicit performance bound that scales with the stopping choice.

The derivations rely on the usual monotonicity properties of the Bellman operator and the existence of at least one stabilizing policy. These are standard tools, but the paper packages them into a single design recipe that lets a user pick the stopping tolerance and immediately read off the stability and suboptimality consequences. That packaging is the practical advance.

The assumptions are labeled mild and the stress-test confirms there is no circularity or missing step in the termination and bound arguments. The results are therefore on solid ground for the class of problems stated.

The main limitation is that the work is incremental rather than foundational. Separate results on finite termination under thresholds and on stability via stopping already exist; the new element is mainly their joint treatment under one flexible criterion. The bounds are explicit but still require the user to select the threshold and to verify the underlying stabilizing policy exists. No numerical validation or implementation details appear in the abstract, so immediate transfer to code is left to the reader.

This is a paper for people who already run value iteration in control or approximate dynamic programming and want theoretical guidance on when to stop. It is coherent and grounded enough to merit referee time, even if the referee will likely ask for tighter comparison with prior stopping rules and some illustrative examples.

Referee Report

0 major / 2 minor

Summary. The paper equips value iteration with a generalized stopping criterion for deterministic discrete-time infinite-horizon (possibly discounted) optimal control problems. It proves finite termination under mild assumptions, shows that the terminal policy is stabilizing when the stopping rule is suitably designed, and derives explicit near-optimality bounds on the final value function and policy that are parameterized by the chosen threshold.

Significance. If the claims hold, the work supplies a practical design framework that lets practitioners select a stopping criterion to balance computational cost against explicit stability and performance guarantees. The results rest on standard monotonicity and contraction properties of the Bellman operator together with the existence of a stabilizing policy, without introducing new technical machinery.

minor comments (2)

[§3] §3: the generalized stopping criterion is introduced via an abstract threshold condition; an explicit comparison (e.g., a short table) with the classical residual and span criteria would clarify the scope of the generalization.
[§2] The statement of the mild assumptions (existence of a stabilizing policy, bounded costs, etc.) appears in §2; moving a concise bullet list of these assumptions to the abstract or introduction would improve readability for control-oriented readers.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary, recognition of the paper's significance, and recommendation to accept the manuscript. We appreciate the assessment that the results rest on standard properties without new technical machinery and provide a practical design framework.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper derives finite termination of value iteration, stability of the terminal policy, and explicit near-optimality bounds via standard dynamic-programming arguments (Bellman operator monotonicity, existence of a stabilizing policy, and a threshold-based stopping rule). These steps rely on the stated mild assumptions for deterministic discrete-time plants and do not reduce to fitted quantities, self-definitional constructs, or load-bearing self-citations. The central claims are self-contained proofs that follow directly from the problem setup without circular reduction to inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities beyond the generic 'mild assumptions' on the system class; ledger therefore empty pending full text.

pith-pipeline@v0.9.1-grok · 5735 in / 985 out tokens · 15932 ms · 2026-06-26T07:42:41.315168+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references

[1]

Bertsekas,Dynamic Programming and Optimal Control, Volume I, vol

D. Bertsekas,Dynamic Programming and Optimal Control, Volume I, vol. 4. Athena Scientific, Belmont, U.S.A., 2012

2012
[2]

Sutton and A

R. Sutton and A. Barto,Reinforcement Learning: An Introduction. MIT Press, Cambridge, U.S.A., 1998

1998
[3]

D. P. Bertsekas,Reinforcement Learning and Optimal Control, vol. 1. Athena Scientific, Belmont, U.S.A., 2019

2019
[4]

Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers,

F. L. Lewis, D. Vrabie, and K. G. Vamvoudakis, “Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers,”IEEE Control Systems Magazine, vol. 32, no. 6, pp. 76–105, 2012

2012
[5]

M. L. Puterman,Markov Decision Processes: Discrete Stochastic Dy- namic Programming. John Wiley & Sons, Hoboken, U.S.A., 2014

2014
[6]

On an iterative technique for riccati equation computa- tions,

D. Kleinman, “On an iterative technique for riccati equation computa- tions,”IEEE Trans. on Aut. Control, vol. 13, no. 1, pp. 114–115, 1968

1968
[7]

Adaptive dynamic programming and optimal control of nonlinear nonaffine systems,

T. Bian, Y . Jiang, and Z.-P. Jiang, “Adaptive dynamic programming and optimal control of nonlinear nonaffine systems,”Automatica, vol. 50, no. 10, pp. 2624–2632, 2014

2014
[8]

Computing stabilizing linear controllers via policy iteration,

A. Lamperski, “Computing stabilizing linear controllers via policy iteration,” inIEEE Conference on Decision and Control, Jeju Island, South Korea, pp. 1902–1907, 2020. 16

1902
[9]

Early termination of NMPC interior point solvers: Relating the duality gap to stability,

A. Pavlov, I. Shames, and C. Manzie, “Early termination of NMPC interior point solvers: Relating the duality gap to stability,” inEuropean Control Conference, Naples, Italy, pp. 805–810, 2019

2019
[10]

Discrete-time non- linear HJB solution using approximate dynamic programming: Conver- gence proof,

A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, “Discrete-time non- linear HJB solution using approximate dynamic programming: Conver- gence proof,”IEEE Transactions on Systems, Man, and Cybernetics, Part B, vol. 38, no. 4, pp. 943–949, 2008

2008
[11]

Revisiting approximate dynamic programming and its convergence,

A. Heydari, “Revisiting approximate dynamic programming and its convergence,”IEEE Transactions on Cybernetics, vol. 44, no. 12, pp. 2733–2743, 2014

2014
[12]

Value iteration adaptive dynamic pro- gramming for optimal control of discrete-time nonlinear systems,

Q. Wei, D. Liu, and H. Lin, “Value iteration adaptive dynamic pro- gramming for optimal control of discrete-time nonlinear systems,”IEEE Transactions on Cybernetics, vol. 46, no. 3, pp. 840–853, 2015

2015
[13]

Stopping criteria for value iteration on stochastic games with quantitative objectives,

J. K ˇret´ınsk`y, T. Meggendorfer, and M. Weininger, “Stopping criteria for value iteration on stochastic games with quantitative objectives,” in Annual ACM/IEEE Symposium on Logic in Computer Science, Boston, U.S.A., pp. 1–14, 2023

2023
[14]

Value iteration and adaptive dynamic pro- gramming for data-driven adaptive optimal control design,

T. Bian and Z.-P. Jiang, “Value iteration and adaptive dynamic pro- gramming for data-driven adaptive optimal control design,”Automatica, vol. 71, pp. 348–360, 2016

2016
[15]

Value and policy iterations in optimal control and adaptive dynamic programming,

D. P. Bertsekas, “Value and policy iterations in optimal control and adaptive dynamic programming,”IEEE Transactions on Neural networks and Learning Systems, vol. 28, no. 3, pp. 500–509, 2015

2015
[16]

Model predictive control: For want of a local control Lyapunov function, all is not lost,

G. Grimm, M. Messina, S. Tuna, and A. Teel, “Model predictive control: For want of a local control Lyapunov function, all is not lost,”IEEE Transactions on Automatic Control, vol. 50, no. 5, pp. 546–558, 2005

2005
[17]

Stability analysis of discrete-time infinite-horizon optimal control with discounted cost,

R. Postoyan, L. Bus ¸oniu, D. Ne ˇsi´c, and J. Daafouz, “Stability analysis of discrete-time infinite-horizon optimal control with discounted cost,” IEEE Trans. on Aut. Control, vol. 62, no. 6, pp. 2736–2749, 2017

2017
[18]

Finite-horizon discounted optimal control: stability and performance,

M. Granzotto, R. Postoyan, L. Bus ¸oniu, D. Ne ˇsi´c, and J. Daafouz, “Finite-horizon discounted optimal control: stability and performance,” IEEE Trans. on Aut. Control, vol. 66, no. 2, pp. 550–565, 2020

2020
[19]

NMPC without terminal constraints,

L. Gr ¨une, “NMPC without terminal constraints,”IFAC Proceedings Volumes, vol. 45, no. 17, pp. 1–13, 2012

2012
[20]

Sta- bilization of strictly dissipative discrete time systems with discounted optimal control,

V . Gaitsgory, L. Gr¨une, M. H ¨oger, C. M. Kellett, and S. R. Weller, “Sta- bilization of strictly dissipative discrete time systems with discounted optimal control,”Automatica, vol. 93, pp. 311–320, 2018

2018
[21]

B. D. Anderson and J. B. Moore,Optimal Control: Linear Quadratic Methods. Prentice-Hall, Inc., Englewood Cliffs, U.S.A., 1971

1971
[22]

Value and policy iterations in optimal control and adap- tive dynamic programming,

D. Bertsekas, “Value and policy iterations in optimal control and adap- tive dynamic programming,”IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 3, pp. 500–509, 2015

2015
[23]

When to stop value iteration: stability and near-optimality versus computation,

M. Granzotto, R. Postoyan, D. Ne ˇsi´c, L. Bus ¸oniu, and J. Daafouz, “When to stop value iteration: stability and near-optimality versus computation,” in3rd Conference on Learning for Dynamics and Control, pp. 412–424, PMLR, 2021

2021
[24]

Goebel, R

R. Goebel, R. Sanfelice, and A. Teel,Hybrid Dynamical Systems. Princeton University Press, Princeton, U.S.A., 2012

2012
[25]

Dynamical properties of hybrid systems simulators,

R. Sanfelice and A. Teel, “Dynamical properties of hybrid systems simulators,”Automatica, vol. 46, no. 2, pp. 239–248, 2010

2010
[26]

Robust stability and near-optimality for policy iteration: for want of recursive feasibility, all is not lost,

M. Granzotto, O. L. De Silva, R. Postoyan, D. Ne ˇsi´c, and Z.-P. Jiang, “Robust stability and near-optimality for policy iteration: for want of recursive feasibility, all is not lost,”IEEE Transactions on Automatic Control, vol. 69, no. 12, pp. 8247–8262, 2024

2024
[27]

Survey constrained model predictive control: Stability and optimality,

D. Mayne, J. Rawlings, C. Rao, and P. Scokaert, “Survey constrained model predictive control: Stability and optimality,”Automatica, vol. 36, no. 6, pp. 789–814, 2000

2000
[28]

Rockafellar and R.-B

R. Rockafellar and R.-B. Wets,Variational Analysis, vol. 317. Springer, Dordrecht, Germany, 3rd ed., 1998

1998
[29]

An existence theorem for discrete-time infinite-horizon optimal control problems,

S. Keerthi and E. Gilbert, “An existence theorem for discrete-time infinite-horizon optimal control problems,”IEEE Transactions on Au- tomatic Control, vol. 30, no. 9, pp. 907–909, 1985

1985
[30]

Rollout algorithms for discrete optimization: A survey,

D. P. Bertsekas, “Rollout algorithms for discrete optimization: A survey,” inHandbook of Combinatorial Optimization, vol. 5, pp. 2989–3013, Springer, New York, U.S.A., 2013

2013
[31]

Model predictive control and reinforcement learn- ing: A unified framework based on dynamic programming,

D. P. Bertsekas, “Model predictive control and reinforcement learn- ing: A unified framework based on dynamic programming,”IFAC- PapersOnLine, vol. 58, no. 18, pp. 363–383, 2024

2024
[32]

On the infinite horizon performance of receding horizon controllers,

L. Grune and A. Rantzer, “On the infinite horizon performance of receding horizon controllers,”IEEE Transactions on Automatic Control, vol. 53, no. 9, pp. 2100–2111, 2008

2008
[33]

Stability analysis of discrete-time finite-horizon discounted optimal control,

M. Granzotto, R. Postoyan, L. Bus ¸oniu, D. Ne ˇsi´c, and J. Daafouz, “Stability analysis of discrete-time finite-horizon discounted optimal control,” inIEEE Conf. on Dec. and Control, Miami, U.S.A., 2018

2018
[34]

Stability and recurrence of nonlinear stochastic systems with optimal control,

R. Moldenhauer, D. Ne ˇsi´c, M. Granzotto, R. Postoyan, and A. Teel, “Stability and recurrence of nonlinear stochastic systems with optimal control,”Submitted for journal publication, 2026

2026

[1] [1]

Bertsekas,Dynamic Programming and Optimal Control, Volume I, vol

D. Bertsekas,Dynamic Programming and Optimal Control, Volume I, vol. 4. Athena Scientific, Belmont, U.S.A., 2012

2012

[2] [2]

Sutton and A

R. Sutton and A. Barto,Reinforcement Learning: An Introduction. MIT Press, Cambridge, U.S.A., 1998

1998

[3] [3]

D. P. Bertsekas,Reinforcement Learning and Optimal Control, vol. 1. Athena Scientific, Belmont, U.S.A., 2019

2019

[4] [4]

Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers,

F. L. Lewis, D. Vrabie, and K. G. Vamvoudakis, “Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers,”IEEE Control Systems Magazine, vol. 32, no. 6, pp. 76–105, 2012

2012

[5] [5]

M. L. Puterman,Markov Decision Processes: Discrete Stochastic Dy- namic Programming. John Wiley & Sons, Hoboken, U.S.A., 2014

2014

[6] [6]

On an iterative technique for riccati equation computa- tions,

D. Kleinman, “On an iterative technique for riccati equation computa- tions,”IEEE Trans. on Aut. Control, vol. 13, no. 1, pp. 114–115, 1968

1968

[7] [7]

Adaptive dynamic programming and optimal control of nonlinear nonaffine systems,

T. Bian, Y . Jiang, and Z.-P. Jiang, “Adaptive dynamic programming and optimal control of nonlinear nonaffine systems,”Automatica, vol. 50, no. 10, pp. 2624–2632, 2014

2014

[8] [8]

Computing stabilizing linear controllers via policy iteration,

A. Lamperski, “Computing stabilizing linear controllers via policy iteration,” inIEEE Conference on Decision and Control, Jeju Island, South Korea, pp. 1902–1907, 2020. 16

1902

[9] [9]

Early termination of NMPC interior point solvers: Relating the duality gap to stability,

A. Pavlov, I. Shames, and C. Manzie, “Early termination of NMPC interior point solvers: Relating the duality gap to stability,” inEuropean Control Conference, Naples, Italy, pp. 805–810, 2019

2019

[10] [10]

Discrete-time non- linear HJB solution using approximate dynamic programming: Conver- gence proof,

A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, “Discrete-time non- linear HJB solution using approximate dynamic programming: Conver- gence proof,”IEEE Transactions on Systems, Man, and Cybernetics, Part B, vol. 38, no. 4, pp. 943–949, 2008

2008

[11] [11]

Revisiting approximate dynamic programming and its convergence,

A. Heydari, “Revisiting approximate dynamic programming and its convergence,”IEEE Transactions on Cybernetics, vol. 44, no. 12, pp. 2733–2743, 2014

2014

[12] [12]

Value iteration adaptive dynamic pro- gramming for optimal control of discrete-time nonlinear systems,

Q. Wei, D. Liu, and H. Lin, “Value iteration adaptive dynamic pro- gramming for optimal control of discrete-time nonlinear systems,”IEEE Transactions on Cybernetics, vol. 46, no. 3, pp. 840–853, 2015

2015

[13] [13]

Stopping criteria for value iteration on stochastic games with quantitative objectives,

J. K ˇret´ınsk`y, T. Meggendorfer, and M. Weininger, “Stopping criteria for value iteration on stochastic games with quantitative objectives,” in Annual ACM/IEEE Symposium on Logic in Computer Science, Boston, U.S.A., pp. 1–14, 2023

2023

[14] [14]

Value iteration and adaptive dynamic pro- gramming for data-driven adaptive optimal control design,

T. Bian and Z.-P. Jiang, “Value iteration and adaptive dynamic pro- gramming for data-driven adaptive optimal control design,”Automatica, vol. 71, pp. 348–360, 2016

2016

[15] [15]

Value and policy iterations in optimal control and adaptive dynamic programming,

D. P. Bertsekas, “Value and policy iterations in optimal control and adaptive dynamic programming,”IEEE Transactions on Neural networks and Learning Systems, vol. 28, no. 3, pp. 500–509, 2015

2015

[16] [16]

Model predictive control: For want of a local control Lyapunov function, all is not lost,

G. Grimm, M. Messina, S. Tuna, and A. Teel, “Model predictive control: For want of a local control Lyapunov function, all is not lost,”IEEE Transactions on Automatic Control, vol. 50, no. 5, pp. 546–558, 2005

2005

[17] [17]

Stability analysis of discrete-time infinite-horizon optimal control with discounted cost,

R. Postoyan, L. Bus ¸oniu, D. Ne ˇsi´c, and J. Daafouz, “Stability analysis of discrete-time infinite-horizon optimal control with discounted cost,” IEEE Trans. on Aut. Control, vol. 62, no. 6, pp. 2736–2749, 2017

2017

[18] [18]

Finite-horizon discounted optimal control: stability and performance,

M. Granzotto, R. Postoyan, L. Bus ¸oniu, D. Ne ˇsi´c, and J. Daafouz, “Finite-horizon discounted optimal control: stability and performance,” IEEE Trans. on Aut. Control, vol. 66, no. 2, pp. 550–565, 2020

2020

[19] [19]

NMPC without terminal constraints,

L. Gr ¨une, “NMPC without terminal constraints,”IFAC Proceedings Volumes, vol. 45, no. 17, pp. 1–13, 2012

2012

[20] [20]

Sta- bilization of strictly dissipative discrete time systems with discounted optimal control,

V . Gaitsgory, L. Gr¨une, M. H ¨oger, C. M. Kellett, and S. R. Weller, “Sta- bilization of strictly dissipative discrete time systems with discounted optimal control,”Automatica, vol. 93, pp. 311–320, 2018

2018

[21] [21]

B. D. Anderson and J. B. Moore,Optimal Control: Linear Quadratic Methods. Prentice-Hall, Inc., Englewood Cliffs, U.S.A., 1971

1971

[22] [22]

Value and policy iterations in optimal control and adap- tive dynamic programming,

D. Bertsekas, “Value and policy iterations in optimal control and adap- tive dynamic programming,”IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 3, pp. 500–509, 2015

2015

[23] [23]

When to stop value iteration: stability and near-optimality versus computation,

M. Granzotto, R. Postoyan, D. Ne ˇsi´c, L. Bus ¸oniu, and J. Daafouz, “When to stop value iteration: stability and near-optimality versus computation,” in3rd Conference on Learning for Dynamics and Control, pp. 412–424, PMLR, 2021

2021

[24] [24]

Goebel, R

R. Goebel, R. Sanfelice, and A. Teel,Hybrid Dynamical Systems. Princeton University Press, Princeton, U.S.A., 2012

2012

[25] [25]

Dynamical properties of hybrid systems simulators,

R. Sanfelice and A. Teel, “Dynamical properties of hybrid systems simulators,”Automatica, vol. 46, no. 2, pp. 239–248, 2010

2010

[26] [26]

Robust stability and near-optimality for policy iteration: for want of recursive feasibility, all is not lost,

M. Granzotto, O. L. De Silva, R. Postoyan, D. Ne ˇsi´c, and Z.-P. Jiang, “Robust stability and near-optimality for policy iteration: for want of recursive feasibility, all is not lost,”IEEE Transactions on Automatic Control, vol. 69, no. 12, pp. 8247–8262, 2024

2024

[27] [27]

Survey constrained model predictive control: Stability and optimality,

D. Mayne, J. Rawlings, C. Rao, and P. Scokaert, “Survey constrained model predictive control: Stability and optimality,”Automatica, vol. 36, no. 6, pp. 789–814, 2000

2000

[28] [28]

Rockafellar and R.-B

R. Rockafellar and R.-B. Wets,Variational Analysis, vol. 317. Springer, Dordrecht, Germany, 3rd ed., 1998

1998

[29] [29]

An existence theorem for discrete-time infinite-horizon optimal control problems,

S. Keerthi and E. Gilbert, “An existence theorem for discrete-time infinite-horizon optimal control problems,”IEEE Transactions on Au- tomatic Control, vol. 30, no. 9, pp. 907–909, 1985

1985

[30] [30]

Rollout algorithms for discrete optimization: A survey,

D. P. Bertsekas, “Rollout algorithms for discrete optimization: A survey,” inHandbook of Combinatorial Optimization, vol. 5, pp. 2989–3013, Springer, New York, U.S.A., 2013

2013

[31] [31]

Model predictive control and reinforcement learn- ing: A unified framework based on dynamic programming,

D. P. Bertsekas, “Model predictive control and reinforcement learn- ing: A unified framework based on dynamic programming,”IFAC- PapersOnLine, vol. 58, no. 18, pp. 363–383, 2024

2024

[32] [32]

On the infinite horizon performance of receding horizon controllers,

L. Grune and A. Rantzer, “On the infinite horizon performance of receding horizon controllers,”IEEE Transactions on Automatic Control, vol. 53, no. 9, pp. 2100–2111, 2008

2008

[33] [33]

Stability analysis of discrete-time finite-horizon discounted optimal control,

M. Granzotto, R. Postoyan, L. Bus ¸oniu, D. Ne ˇsi´c, and J. Daafouz, “Stability analysis of discrete-time finite-horizon discounted optimal control,” inIEEE Conf. on Dec. and Control, Miami, U.S.A., 2018

2018

[34] [34]

Stability and recurrence of nonlinear stochastic systems with optimal control,

R. Moldenhauer, D. Ne ˇsi´c, M. Granzotto, R. Postoyan, and A. Teel, “Stability and recurrence of nonlinear stochastic systems with optimal control,”Submitted for journal publication, 2026

2026