Minimax optimal dual control -- The single input case
Pith reviewed 2026-05-10 03:55 UTC · model grok-4.3
The pith
An explicit solution exists for the Bellman inequality that defines minimax optimal dual control in single-input linear systems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
An explicit solution is derived for the Bellman inequality corresponding to minimax optimal dual control. The minimizing player determines control action as a function of past state measurements and inputs. The maximizing player selects disturbances and model parameters for the underlying linear time-invariant dynamics. The optimal minimizing policy is a dual controller that optimizes the tradeoff between exploration and exploitation. Once sufficient data has been collected, the policy becomes a deterministic certainty equivalence controller. However, when data is insufficient, the policy introduces a randomized term to improve excitation.
What carries the argument
The explicit closed-form solution to the Bellman inequality that encodes the value function of the zero-sum game between the controller and the adversary.
If this is right
- The optimal policy can be evaluated directly from observed data without online optimization.
- The controller automatically injects randomization only while identifiability remains incomplete.
- After a finite amount of data the policy reverts exactly to certainty-equivalence control.
- The solution guarantees performance against any choice of parameters and disturbances chosen by the adversary.
- The same structure yields a computable dual controller for every single-input linear system.
Where Pith is reading between the lines
- The explicit form may make it possible to prove finite-time bounds on the exploration cost that were previously unavailable.
- The same Bellman-inequality approach could be tested on multi-input or mildly nonlinear plants to see whether closed-form solutions survive.
- The randomization term supplies a concrete, optimality-derived excitation signal that system-identification methods could adopt directly.
Load-bearing premise
The plant must be linear time-invariant with a single input, and the adversary is free to choose both disturbances and the unknown model parameters.
What would settle it
For a concrete scalar or low-dimensional linear system, solve the Bellman inequality numerically and compare the value to the explicit formula; any mismatch between the two would show that the claimed solution is not correct.
Figures
read the original abstract
An explicit solution is derived for the Bellman inequality corresponding to minimax optimal dual control. The minimizing player determines control action as a function of past state measurements and inputs. The maximizing player selects disturbances and model parameters for the underlying linear time-invariant dynamics. The optimal minimizing policy is a dual controller that optimizes the tradeoff between exploration and exploitation. Once sufficient data has been collected, the policy becomes a deterministic certainty equivalence controller. However, when data is insufficient, the policy introduces a randomized term to improve excitation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript derives an explicit closed-form solution to the Bellman inequality for the minimax optimal dual control problem in the single-input linear time-invariant (LTI) setting. The minimizing player selects the control input as a function of past state measurements and inputs only, while the maximizing player chooses both the process disturbances and the unknown system matrices. The resulting policy is a dual controller that trades off exploration and exploitation: it applies a deterministic certainty-equivalence control once a persistence-of-excitation threshold is met and otherwise augments the input with a randomized excitation term.
Significance. If the explicit solution is correct, the result is significant for adaptive and dual control theory. It supplies the first closed-form minimax-optimal policy for this class of problems, rigorously characterizing when randomization is required to guarantee identifiability against an adversarial choice of model parameters. The derivation rests on standard LTI controllability/observability assumptions and an explicit information pattern, both of which are stated clearly.
major comments (1)
- [Bellman inequality solution and dynamic programming recursion] The steps that produce the explicit solution to the Bellman inequality (particularly the form of the randomized term and the precise persistence-of-excitation threshold) must be verified in detail; it is not immediately obvious from the high-level description that the candidate policy satisfies the inequality for every admissible choice of disturbances and matrices chosen by the maximizer.
minor comments (2)
- [Problem formulation] Clarify in the problem statement whether the maximizer selects the system matrices once at the beginning or can change them at each time step; the current wording leaves this ambiguous.
- [Abstract and policy description] The abstract states that the policy 'introduces a randomized term' but does not specify the support or distribution of the randomization; this detail should be stated explicitly for reproducibility.
Simulated Author's Rebuttal
We thank the referee for the careful reading of the manuscript and for recognizing the potential significance of an explicit closed-form minimax-optimal dual controller for single-input LTI systems. We address the single major comment below and will incorporate the requested clarifications into a revised version.
read point-by-point responses
-
Referee: [Bellman inequality solution and dynamic programming recursion] The steps that produce the explicit solution to the Bellman inequality (particularly the form of the randomized term and the precise persistence-of-excitation threshold) must be verified in detail; it is not immediately obvious from the high-level description that the candidate policy satisfies the inequality for every admissible choice of disturbances and matrices chosen by the maximizer.
Authors: We agree that a more granular verification strengthens the presentation. The manuscript derives the candidate policy by solving the minimax Bellman inequality under the given information pattern (past states and inputs only). The randomized excitation term is obtained by ensuring that the worst-case maximizer cannot prevent the information matrix from becoming full rank; its explicit form is the minimal-variance perturbation that guarantees the persistence-of-excitation condition in finite time for any admissible disturbance sequence and any controllable single-input pair (A,B). The threshold is the smallest integer N such that the cumulative regressor matrix has rank n for every possible (A,B) in the admissible set. In the revision we will add an appendix containing the complete inductive verification: (i) substitution of the policy into the Bellman operator, (ii) explicit computation of the resulting value-function upper bound, and (iii) demonstration that equality holds against the adversarial choice of disturbances and parameters once the threshold is crossed. This will make the satisfaction of the inequality transparent for every admissible maximizer strategy. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper presents a direct derivation of an explicit solution to the Bellman inequality for the minimax dual-control game on single-input LTI systems. The minimizing policy is obtained by solving the dynamic programming recursion under the stated information pattern (past states and inputs), with the adversary selecting disturbances and parameters at each step. This yields a certainty-equivalent controller once a persistence-of-excitation threshold is met, otherwise augmented by randomized excitation. The argument relies only on standard LTI controllability/observability assumptions and the internal consistency of the minimax formulation; no step reduces by construction to a fitted input, self-definition, or unverified self-citation chain. The result is self-contained against external game-theoretic and control-theoretic benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The system dynamics are linear time-invariant with single input.
- ad hoc to paper The Bellman inequality admits an explicit solution in this minimax setting.
Reference graph
Works this paper leans on
-
[1]
Dual control of a first order dynamical system
Bo Bernhardsson. Dual control of a first order dynamical system. In Nordic Section of SIAM Meeting on Industrial and Applied Mathe- matics, 1988
work page 1988
-
[2]
Synthesis of minimax adaptive controller for a finite set of linear systems
Daniel Cederberg, Anders Hansson, and Anders Rantzer. Synthesis of minimax adaptive controller for a finite set of linear systems. In 2022 IEEE 61st Conference on Decision and Control (CDC), pages 1380–1384. IEEE, 2022
work page 2022
-
[3]
Salvatore J Cusumano and Kameshwar Poolla. Nonlinear feedback vs. linear feedback for robust stabilization. InDecision and Control, 1988., Proceedings of the 27th IEEE Conference on, pages 1776–1780. IEEE, 1988
work page 1988
-
[4]
A. A. Feldbaum. Dual control theory I.Avtomatika i Telemekhanika, 21(9):1240–1249, 1960
work page 1960
-
[5]
Nikolai M Filatov and Heinz Unbehauen. Survey of adaptive dual control methods.IEE Proceedings-Control Theory and Applications, 147(1):118–128, 2000
work page 2000
-
[6]
Emilien Flayac, Girish Nair, and Iman Shames. Nonlinear dual control based on fast moving horizon estimation and model predictive control with an observability constraint. In2021 60th IEEE Conference on Decision and Control (CDC), pages 3030–3035. IEEE, 2021
work page 2021
-
[7]
Anders Helmersson and Karl Johan ˚Astr¨om. Dual control of an integrator with unknown gain.Computers & Mathematics with Applications, 12(6A), 1986
work page 1986
-
[8]
Minimax adaptive estimation for finite sets of linear systems
Olle Kjellqvist and Anders Rantzer. Minimax adaptive estimation for finite sets of linear systems. In2022 American Control Conference (ACC), pages 260–265. IEEE, 2022
work page 2022
-
[9]
A Megretski. A nonlinear dynamical game interpretation of adaptive ℓ2 control: Performance limitations and suboptimal controllers. In Proceedings of 16th International Symposium on Mathematical Theory of Networks and Systems (MTNS2004), Leuven, 2004
work page 2004
-
[10]
Ali Mesbah. Stochastic model predictive control with active uncer- tainty learning: A survey on dual control.Annual Reviews in Control, 45:107–117, 2018
work page 2018
-
[11]
Minimax adaptive control for a finite set of linear systems
Anders Rantzer. Minimax adaptive control for a finite set of linear systems. InLearning for Dynamics and Control, pages 893–904. PMLR, 2021
work page 2021
-
[12]
Minimax optimal adaptive control for systems on cones
Anders Rantzer. Minimax optimal adaptive control for systems on cones. In2025 IEEE 64th Conference on Decision and Control (CDC), pages 4137–4139. IEEE, 2025
work page 2025
-
[13]
On minimax optimal dual control for fully actuated systems
Anders Rantzer. On minimax optimal dual control for fully actuated systems. In2025 American Control Conference (ACC), 2025
work page 2025
-
[14]
The theory and design of robust adaptive controllers.Automatica, pages 19–24, 1987
J Sun and PA Ioannou. The theory and design of robust adaptive controllers.Automatica, pages 19–24, 1987
work page 1987
-
[15]
Examples and counterexamples in finite l2-gain adaptive control
Glenn Vinnicombe. Examples and counterexamples in finite l2-gain adaptive control. InProceedings of 16th International Symposium on Mathematical Theory of Networks and Systems (MTNS2004), Leuven, 2004
work page 2004
-
[16]
Adaptive dual control methods: An overview
Bj ¨orn Wittenmark. Adaptive dual control methods: An overview. Adaptive Systems in Control and Signal Processing 1995, pages 67– 72, 1995. IV. APPENDIX Proof of Lemma 2. min u max w |x|2 S +|u| 2 R −γ 2|w|2 +|Ax+Bu+w| 2 = min u |x|2 S +|u| 2 R + |Ax+Bu| 2 1−γ −2 =|x| 2 S + |Ax|2 −(|B| 2 + (1−γ −2)R)−1(B⊤Ax)2 1−γ −2 , Define ¯R:= (1−γ −2)R. ThenBis the se...
work page 1995
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.