Minimax optimal dual control -- The single input case

Anders Rantzer

arxiv: 2604.18550 · v2 · submitted 2026-04-20 · 🧮 math.OC

Minimax optimal dual control -- The single input case

Anders Rantzer This is my paper

Pith reviewed 2026-05-10 03:55 UTC · model grok-4.3

classification 🧮 math.OC

keywords minimax controldual controlBellman inequalityadaptive controlexploration-exploitationlinear systemscertainty equivalencesingle-input systems

0 comments

The pith

An explicit solution exists for the Bellman inequality that defines minimax optimal dual control in single-input linear systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives an explicit solution to the Bellman inequality for minimax optimal dual control of single-input linear time-invariant systems. In this setup the minimizing player selects control inputs based on past measurements while the maximizing player chooses both disturbances and the unknown model parameters to maximize cost. The resulting policy is a dual controller that trades off learning the dynamics against using them for control. When past data suffices the policy reduces to a deterministic certainty-equivalence law; otherwise it adds a randomized excitation term. This matters because it converts an abstract exploration-exploitation dilemma into a concrete, computable strategy for worst-case adaptive control.

Core claim

An explicit solution is derived for the Bellman inequality corresponding to minimax optimal dual control. The minimizing player determines control action as a function of past state measurements and inputs. The maximizing player selects disturbances and model parameters for the underlying linear time-invariant dynamics. The optimal minimizing policy is a dual controller that optimizes the tradeoff between exploration and exploitation. Once sufficient data has been collected, the policy becomes a deterministic certainty equivalence controller. However, when data is insufficient, the policy introduces a randomized term to improve excitation.

What carries the argument

The explicit closed-form solution to the Bellman inequality that encodes the value function of the zero-sum game between the controller and the adversary.

If this is right

The optimal policy can be evaluated directly from observed data without online optimization.
The controller automatically injects randomization only while identifiability remains incomplete.
After a finite amount of data the policy reverts exactly to certainty-equivalence control.
The solution guarantees performance against any choice of parameters and disturbances chosen by the adversary.
The same structure yields a computable dual controller for every single-input linear system.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The explicit form may make it possible to prove finite-time bounds on the exploration cost that were previously unavailable.
The same Bellman-inequality approach could be tested on multi-input or mildly nonlinear plants to see whether closed-form solutions survive.
The randomization term supplies a concrete, optimality-derived excitation signal that system-identification methods could adopt directly.

Load-bearing premise

The plant must be linear time-invariant with a single input, and the adversary is free to choose both disturbances and the unknown model parameters.

What would settle it

For a concrete scalar or low-dimensional linear system, solve the Bellman inequality numerically and compare the value to the explicit formula; any mismatch between the two would show that the claimed solution is not correct.

Figures

Figures reproduced from arXiv: 2604.18550 by Anders Rantzer.

read the original abstract

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

read the letter

Rantzer gives an explicit minimax solution for dual control on single-input LTI systems, where the policy adds randomized excitation only until a persistence threshold is met and then switches to certainty equivalence. The minimizing player uses only past states and inputs, while the adversary picks both disturbances and the unknown matrices at each step. The derivation solves the associated Bellman inequality directly and produces a policy that is deterministic once enough data has been collected. This is the main new element: a closed-form expression rather than an abstract dynamic program for the exploration-exploitation tradeoff under this specific game setup. The argument stays inside standard LTI controllability and observability assumptions and is internally consistent with the minimax formulation. The stress-test found no algebraic gaps or circular steps, which aligns with what the abstract and derivation outline show. The single-input restriction keeps the algebra manageable but is a clear boundary; the result does not immediately carry over to multi-input cases where choosing which directions to excite is harder. The randomization term is introduced to guarantee excitation, yet the paper does not appear to quantify how performance varies with the exact distribution chosen in finite time. These points are limitations rather than contradictions. The work is aimed at control theorists who already work on adaptive or robust methods and want a concrete benchmark for dual control. A reader looking for implementable code or broad empirical tests will not find them here. It deserves peer review because the explicit solution is uncommon in this literature and the internal logic holds up without obvious flaws.

Referee Report

1 major / 2 minor

Summary. The manuscript derives an explicit closed-form solution to the Bellman inequality for the minimax optimal dual control problem in the single-input linear time-invariant (LTI) setting. The minimizing player selects the control input as a function of past state measurements and inputs only, while the maximizing player chooses both the process disturbances and the unknown system matrices. The resulting policy is a dual controller that trades off exploration and exploitation: it applies a deterministic certainty-equivalence control once a persistence-of-excitation threshold is met and otherwise augments the input with a randomized excitation term.

Significance. If the explicit solution is correct, the result is significant for adaptive and dual control theory. It supplies the first closed-form minimax-optimal policy for this class of problems, rigorously characterizing when randomization is required to guarantee identifiability against an adversarial choice of model parameters. The derivation rests on standard LTI controllability/observability assumptions and an explicit information pattern, both of which are stated clearly.

major comments (1)

[Bellman inequality solution and dynamic programming recursion] The steps that produce the explicit solution to the Bellman inequality (particularly the form of the randomized term and the precise persistence-of-excitation threshold) must be verified in detail; it is not immediately obvious from the high-level description that the candidate policy satisfies the inequality for every admissible choice of disturbances and matrices chosen by the maximizer.

minor comments (2)

[Problem formulation] Clarify in the problem statement whether the maximizer selects the system matrices once at the beginning or can change them at each time step; the current wording leaves this ambiguous.
[Abstract and policy description] The abstract states that the policy 'introduces a randomized term' but does not specify the support or distribution of the randomization; this detail should be stated explicitly for reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading of the manuscript and for recognizing the potential significance of an explicit closed-form minimax-optimal dual controller for single-input LTI systems. We address the single major comment below and will incorporate the requested clarifications into a revised version.

read point-by-point responses

Referee: [Bellman inequality solution and dynamic programming recursion] The steps that produce the explicit solution to the Bellman inequality (particularly the form of the randomized term and the precise persistence-of-excitation threshold) must be verified in detail; it is not immediately obvious from the high-level description that the candidate policy satisfies the inequality for every admissible choice of disturbances and matrices chosen by the maximizer.

Authors: We agree that a more granular verification strengthens the presentation. The manuscript derives the candidate policy by solving the minimax Bellman inequality under the given information pattern (past states and inputs only). The randomized excitation term is obtained by ensuring that the worst-case maximizer cannot prevent the information matrix from becoming full rank; its explicit form is the minimal-variance perturbation that guarantees the persistence-of-excitation condition in finite time for any admissible disturbance sequence and any controllable single-input pair (A,B). The threshold is the smallest integer N such that the cumulative regressor matrix has rank n for every possible (A,B) in the admissible set. In the revision we will add an appendix containing the complete inductive verification: (i) substitution of the policy into the Bellman operator, (ii) explicit computation of the resulting value-function upper bound, and (iii) demonstration that equality holds against the adversarial choice of disturbances and parameters once the threshold is crossed. This will make the satisfaction of the inequality transparent for every admissible maximizer strategy. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents a direct derivation of an explicit solution to the Bellman inequality for the minimax dual-control game on single-input LTI systems. The minimizing policy is obtained by solving the dynamic programming recursion under the stated information pattern (past states and inputs), with the adversary selecting disturbances and parameters at each step. This yields a certainty-equivalent controller once a persistence-of-excitation threshold is met, otherwise augmented by randomized excitation. The argument relies only on standard LTI controllability/observability assumptions and the internal consistency of the minimax formulation; no step reduces by construction to a fitted input, self-definition, or unverified self-citation chain. The result is self-contained against external game-theoretic and control-theoretic benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard LTI dynamics assumptions and the existence of an explicit solution to the Bellman inequality; no free parameters or invented entities are indicated in the abstract.

axioms (2)

domain assumption The system dynamics are linear time-invariant with single input.
Stated directly in the abstract as the model class.
ad hoc to paper The Bellman inequality admits an explicit solution in this minimax setting.
This is the load-bearing claim of the paper.

pith-pipeline@v0.9.0 · 5363 in / 1035 out tokens · 40813 ms · 2026-05-10T03:55:26.703852+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

[1]

Dual control of a first order dynamical system

Bo Bernhardsson. Dual control of a first order dynamical system. In Nordic Section of SIAM Meeting on Industrial and Applied Mathe- matics, 1988

work page 1988
[2]

Synthesis of minimax adaptive controller for a finite set of linear systems

Daniel Cederberg, Anders Hansson, and Anders Rantzer. Synthesis of minimax adaptive controller for a finite set of linear systems. In 2022 IEEE 61st Conference on Decision and Control (CDC), pages 1380–1384. IEEE, 2022

work page 2022
[3]

Nonlinear feedback vs

Salvatore J Cusumano and Kameshwar Poolla. Nonlinear feedback vs. linear feedback for robust stabilization. InDecision and Control, 1988., Proceedings of the 27th IEEE Conference on, pages 1776–1780. IEEE, 1988

work page 1988
[4]

A. A. Feldbaum. Dual control theory I.Avtomatika i Telemekhanika, 21(9):1240–1249, 1960

work page 1960
[5]

Survey of adaptive dual control methods.IEE Proceedings-Control Theory and Applications, 147(1):118–128, 2000

Nikolai M Filatov and Heinz Unbehauen. Survey of adaptive dual control methods.IEE Proceedings-Control Theory and Applications, 147(1):118–128, 2000

work page 2000
[6]

Nonlinear dual control based on fast moving horizon estimation and model predictive control with an observability constraint

Emilien Flayac, Girish Nair, and Iman Shames. Nonlinear dual control based on fast moving horizon estimation and model predictive control with an observability constraint. In2021 60th IEEE Conference on Decision and Control (CDC), pages 3030–3035. IEEE, 2021

work page 2021
[7]

Dual control of an integrator with unknown gain.Computers & Mathematics with Applications, 12(6A), 1986

Anders Helmersson and Karl Johan ˚Astr¨om. Dual control of an integrator with unknown gain.Computers & Mathematics with Applications, 12(6A), 1986

work page 1986
[8]

Minimax adaptive estimation for finite sets of linear systems

Olle Kjellqvist and Anders Rantzer. Minimax adaptive estimation for finite sets of linear systems. In2022 American Control Conference (ACC), pages 260–265. IEEE, 2022

work page 2022
[9]

A nonlinear dynamical game interpretation of adaptive ℓ2 control: Performance limitations and suboptimal controllers

A Megretski. A nonlinear dynamical game interpretation of adaptive ℓ2 control: Performance limitations and suboptimal controllers. In Proceedings of 16th International Symposium on Mathematical Theory of Networks and Systems (MTNS2004), Leuven, 2004

work page 2004
[10]

Stochastic model predictive control with active uncer- tainty learning: A survey on dual control.Annual Reviews in Control, 45:107–117, 2018

Ali Mesbah. Stochastic model predictive control with active uncer- tainty learning: A survey on dual control.Annual Reviews in Control, 45:107–117, 2018

work page 2018
[11]

Minimax adaptive control for a finite set of linear systems

Anders Rantzer. Minimax adaptive control for a finite set of linear systems. InLearning for Dynamics and Control, pages 893–904. PMLR, 2021

work page 2021
[12]

Minimax optimal adaptive control for systems on cones

Anders Rantzer. Minimax optimal adaptive control for systems on cones. In2025 IEEE 64th Conference on Decision and Control (CDC), pages 4137–4139. IEEE, 2025

work page 2025
[13]

On minimax optimal dual control for fully actuated systems

Anders Rantzer. On minimax optimal dual control for fully actuated systems. In2025 American Control Conference (ACC), 2025

work page 2025
[14]

The theory and design of robust adaptive controllers.Automatica, pages 19–24, 1987

J Sun and PA Ioannou. The theory and design of robust adaptive controllers.Automatica, pages 19–24, 1987

work page 1987
[15]

Examples and counterexamples in finite l2-gain adaptive control

Glenn Vinnicombe. Examples and counterexamples in finite l2-gain adaptive control. InProceedings of 16th International Symposium on Mathematical Theory of Networks and Systems (MTNS2004), Leuven, 2004

work page 2004
[16]

Adaptive dual control methods: An overview

Bj ¨orn Wittenmark. Adaptive dual control methods: An overview. Adaptive Systems in Control and Signal Processing 1995, pages 67– 72, 1995. IV. APPENDIX Proof of Lemma 2. min u max w |x|2 S +|u| 2 R −γ 2|w|2 +|Ax+Bu+w| 2 = min u |x|2 S +|u| 2 R + |Ax+Bu| 2 1−γ −2 =|x| 2 S + |Ax|2 −(|B| 2 + (1−γ −2)R)−1(B⊤Ax)2 1−γ −2 , Define ¯R:= (1−γ −2)R. ThenBis the se...

work page 1995

[1] [1]

Dual control of a first order dynamical system

Bo Bernhardsson. Dual control of a first order dynamical system. In Nordic Section of SIAM Meeting on Industrial and Applied Mathe- matics, 1988

work page 1988

[2] [2]

Synthesis of minimax adaptive controller for a finite set of linear systems

Daniel Cederberg, Anders Hansson, and Anders Rantzer. Synthesis of minimax adaptive controller for a finite set of linear systems. In 2022 IEEE 61st Conference on Decision and Control (CDC), pages 1380–1384. IEEE, 2022

work page 2022

[3] [3]

Nonlinear feedback vs

Salvatore J Cusumano and Kameshwar Poolla. Nonlinear feedback vs. linear feedback for robust stabilization. InDecision and Control, 1988., Proceedings of the 27th IEEE Conference on, pages 1776–1780. IEEE, 1988

work page 1988

[4] [4]

A. A. Feldbaum. Dual control theory I.Avtomatika i Telemekhanika, 21(9):1240–1249, 1960

work page 1960

[5] [5]

Survey of adaptive dual control methods.IEE Proceedings-Control Theory and Applications, 147(1):118–128, 2000

Nikolai M Filatov and Heinz Unbehauen. Survey of adaptive dual control methods.IEE Proceedings-Control Theory and Applications, 147(1):118–128, 2000

work page 2000

[6] [6]

Nonlinear dual control based on fast moving horizon estimation and model predictive control with an observability constraint

Emilien Flayac, Girish Nair, and Iman Shames. Nonlinear dual control based on fast moving horizon estimation and model predictive control with an observability constraint. In2021 60th IEEE Conference on Decision and Control (CDC), pages 3030–3035. IEEE, 2021

work page 2021

[7] [7]

Dual control of an integrator with unknown gain.Computers & Mathematics with Applications, 12(6A), 1986

Anders Helmersson and Karl Johan ˚Astr¨om. Dual control of an integrator with unknown gain.Computers & Mathematics with Applications, 12(6A), 1986

work page 1986

[8] [8]

Minimax adaptive estimation for finite sets of linear systems

Olle Kjellqvist and Anders Rantzer. Minimax adaptive estimation for finite sets of linear systems. In2022 American Control Conference (ACC), pages 260–265. IEEE, 2022

work page 2022

[9] [9]

A nonlinear dynamical game interpretation of adaptive ℓ2 control: Performance limitations and suboptimal controllers

A Megretski. A nonlinear dynamical game interpretation of adaptive ℓ2 control: Performance limitations and suboptimal controllers. In Proceedings of 16th International Symposium on Mathematical Theory of Networks and Systems (MTNS2004), Leuven, 2004

work page 2004

[10] [10]

Stochastic model predictive control with active uncer- tainty learning: A survey on dual control.Annual Reviews in Control, 45:107–117, 2018

Ali Mesbah. Stochastic model predictive control with active uncer- tainty learning: A survey on dual control.Annual Reviews in Control, 45:107–117, 2018

work page 2018

[11] [11]

Minimax adaptive control for a finite set of linear systems

Anders Rantzer. Minimax adaptive control for a finite set of linear systems. InLearning for Dynamics and Control, pages 893–904. PMLR, 2021

work page 2021

[12] [12]

Minimax optimal adaptive control for systems on cones

Anders Rantzer. Minimax optimal adaptive control for systems on cones. In2025 IEEE 64th Conference on Decision and Control (CDC), pages 4137–4139. IEEE, 2025

work page 2025

[13] [13]

On minimax optimal dual control for fully actuated systems

Anders Rantzer. On minimax optimal dual control for fully actuated systems. In2025 American Control Conference (ACC), 2025

work page 2025

[14] [14]

The theory and design of robust adaptive controllers.Automatica, pages 19–24, 1987

J Sun and PA Ioannou. The theory and design of robust adaptive controllers.Automatica, pages 19–24, 1987

work page 1987

[15] [15]

Examples and counterexamples in finite l2-gain adaptive control

Glenn Vinnicombe. Examples and counterexamples in finite l2-gain adaptive control. InProceedings of 16th International Symposium on Mathematical Theory of Networks and Systems (MTNS2004), Leuven, 2004

work page 2004

[16] [16]

Adaptive dual control methods: An overview

Bj ¨orn Wittenmark. Adaptive dual control methods: An overview. Adaptive Systems in Control and Signal Processing 1995, pages 67– 72, 1995. IV. APPENDIX Proof of Lemma 2. min u max w |x|2 S +|u| 2 R −γ 2|w|2 +|Ax+Bu+w| 2 = min u |x|2 S +|u| 2 R + |Ax+Bu| 2 1−γ −2 =|x| 2 S + |Ax|2 −(|B| 2 + (1−γ −2)R)−1(B⊤Ax)2 1−γ −2 , Define ¯R:= (1−γ −2)R. ThenBis the se...

work page 1995