pith. sign in

arxiv: 2604.18550 · v2 · submitted 2026-04-20 · 🧮 math.OC

Minimax optimal dual control -- The single input case

Pith reviewed 2026-05-10 03:55 UTC · model grok-4.3

classification 🧮 math.OC
keywords minimax controldual controlBellman inequalityadaptive controlexploration-exploitationlinear systemscertainty equivalencesingle-input systems
0
0 comments X

The pith

An explicit solution exists for the Bellman inequality that defines minimax optimal dual control in single-input linear systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives an explicit solution to the Bellman inequality for minimax optimal dual control of single-input linear time-invariant systems. In this setup the minimizing player selects control inputs based on past measurements while the maximizing player chooses both disturbances and the unknown model parameters to maximize cost. The resulting policy is a dual controller that trades off learning the dynamics against using them for control. When past data suffices the policy reduces to a deterministic certainty-equivalence law; otherwise it adds a randomized excitation term. This matters because it converts an abstract exploration-exploitation dilemma into a concrete, computable strategy for worst-case adaptive control.

Core claim

An explicit solution is derived for the Bellman inequality corresponding to minimax optimal dual control. The minimizing player determines control action as a function of past state measurements and inputs. The maximizing player selects disturbances and model parameters for the underlying linear time-invariant dynamics. The optimal minimizing policy is a dual controller that optimizes the tradeoff between exploration and exploitation. Once sufficient data has been collected, the policy becomes a deterministic certainty equivalence controller. However, when data is insufficient, the policy introduces a randomized term to improve excitation.

What carries the argument

The explicit closed-form solution to the Bellman inequality that encodes the value function of the zero-sum game between the controller and the adversary.

If this is right

  • The optimal policy can be evaluated directly from observed data without online optimization.
  • The controller automatically injects randomization only while identifiability remains incomplete.
  • After a finite amount of data the policy reverts exactly to certainty-equivalence control.
  • The solution guarantees performance against any choice of parameters and disturbances chosen by the adversary.
  • The same structure yields a computable dual controller for every single-input linear system.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The explicit form may make it possible to prove finite-time bounds on the exploration cost that were previously unavailable.
  • The same Bellman-inequality approach could be tested on multi-input or mildly nonlinear plants to see whether closed-form solutions survive.
  • The randomization term supplies a concrete, optimality-derived excitation signal that system-identification methods could adopt directly.

Load-bearing premise

The plant must be linear time-invariant with a single input, and the adversary is free to choose both disturbances and the unknown model parameters.

What would settle it

For a concrete scalar or low-dimensional linear system, solve the Bellman inequality numerically and compare the value to the explicit formula; any mismatch between the two would show that the claimed solution is not correct.

Figures

Figures reproduced from arXiv: 2604.18550 by Anders Rantzer.

Figure 1
Figure 1. Figure 1: We want a feedback controller that works for all [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
read the original abstract

An explicit solution is derived for the Bellman inequality corresponding to minimax optimal dual control. The minimizing player determines control action as a function of past state measurements and inputs. The maximizing player selects disturbances and model parameters for the underlying linear time-invariant dynamics. The optimal minimizing policy is a dual controller that optimizes the tradeoff between exploration and exploitation. Once sufficient data has been collected, the policy becomes a deterministic certainty equivalence controller. However, when data is insufficient, the policy introduces a randomized term to improve excitation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript derives an explicit closed-form solution to the Bellman inequality for the minimax optimal dual control problem in the single-input linear time-invariant (LTI) setting. The minimizing player selects the control input as a function of past state measurements and inputs only, while the maximizing player chooses both the process disturbances and the unknown system matrices. The resulting policy is a dual controller that trades off exploration and exploitation: it applies a deterministic certainty-equivalence control once a persistence-of-excitation threshold is met and otherwise augments the input with a randomized excitation term.

Significance. If the explicit solution is correct, the result is significant for adaptive and dual control theory. It supplies the first closed-form minimax-optimal policy for this class of problems, rigorously characterizing when randomization is required to guarantee identifiability against an adversarial choice of model parameters. The derivation rests on standard LTI controllability/observability assumptions and an explicit information pattern, both of which are stated clearly.

major comments (1)
  1. [Bellman inequality solution and dynamic programming recursion] The steps that produce the explicit solution to the Bellman inequality (particularly the form of the randomized term and the precise persistence-of-excitation threshold) must be verified in detail; it is not immediately obvious from the high-level description that the candidate policy satisfies the inequality for every admissible choice of disturbances and matrices chosen by the maximizer.
minor comments (2)
  1. [Problem formulation] Clarify in the problem statement whether the maximizer selects the system matrices once at the beginning or can change them at each time step; the current wording leaves this ambiguous.
  2. [Abstract and policy description] The abstract states that the policy 'introduces a randomized term' but does not specify the support or distribution of the randomization; this detail should be stated explicitly for reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading of the manuscript and for recognizing the potential significance of an explicit closed-form minimax-optimal dual controller for single-input LTI systems. We address the single major comment below and will incorporate the requested clarifications into a revised version.

read point-by-point responses
  1. Referee: [Bellman inequality solution and dynamic programming recursion] The steps that produce the explicit solution to the Bellman inequality (particularly the form of the randomized term and the precise persistence-of-excitation threshold) must be verified in detail; it is not immediately obvious from the high-level description that the candidate policy satisfies the inequality for every admissible choice of disturbances and matrices chosen by the maximizer.

    Authors: We agree that a more granular verification strengthens the presentation. The manuscript derives the candidate policy by solving the minimax Bellman inequality under the given information pattern (past states and inputs only). The randomized excitation term is obtained by ensuring that the worst-case maximizer cannot prevent the information matrix from becoming full rank; its explicit form is the minimal-variance perturbation that guarantees the persistence-of-excitation condition in finite time for any admissible disturbance sequence and any controllable single-input pair (A,B). The threshold is the smallest integer N such that the cumulative regressor matrix has rank n for every possible (A,B) in the admissible set. In the revision we will add an appendix containing the complete inductive verification: (i) substitution of the policy into the Bellman operator, (ii) explicit computation of the resulting value-function upper bound, and (iii) demonstration that equality holds against the adversarial choice of disturbances and parameters once the threshold is crossed. This will make the satisfaction of the inequality transparent for every admissible maximizer strategy. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents a direct derivation of an explicit solution to the Bellman inequality for the minimax dual-control game on single-input LTI systems. The minimizing policy is obtained by solving the dynamic programming recursion under the stated information pattern (past states and inputs), with the adversary selecting disturbances and parameters at each step. This yields a certainty-equivalent controller once a persistence-of-excitation threshold is met, otherwise augmented by randomized excitation. The argument relies only on standard LTI controllability/observability assumptions and the internal consistency of the minimax formulation; no step reduces by construction to a fitted input, self-definition, or unverified self-citation chain. The result is self-contained against external game-theoretic and control-theoretic benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard LTI dynamics assumptions and the existence of an explicit solution to the Bellman inequality; no free parameters or invented entities are indicated in the abstract.

axioms (2)
  • domain assumption The system dynamics are linear time-invariant with single input.
    Stated directly in the abstract as the model class.
  • ad hoc to paper The Bellman inequality admits an explicit solution in this minimax setting.
    This is the load-bearing claim of the paper.

pith-pipeline@v0.9.0 · 5363 in / 1035 out tokens · 40813 ms · 2026-05-10T03:55:26.703852+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

  1. [1]

    Dual control of a first order dynamical system

    Bo Bernhardsson. Dual control of a first order dynamical system. In Nordic Section of SIAM Meeting on Industrial and Applied Mathe- matics, 1988

  2. [2]

    Synthesis of minimax adaptive controller for a finite set of linear systems

    Daniel Cederberg, Anders Hansson, and Anders Rantzer. Synthesis of minimax adaptive controller for a finite set of linear systems. In 2022 IEEE 61st Conference on Decision and Control (CDC), pages 1380–1384. IEEE, 2022

  3. [3]

    Nonlinear feedback vs

    Salvatore J Cusumano and Kameshwar Poolla. Nonlinear feedback vs. linear feedback for robust stabilization. InDecision and Control, 1988., Proceedings of the 27th IEEE Conference on, pages 1776–1780. IEEE, 1988

  4. [4]

    A. A. Feldbaum. Dual control theory I.Avtomatika i Telemekhanika, 21(9):1240–1249, 1960

  5. [5]

    Survey of adaptive dual control methods.IEE Proceedings-Control Theory and Applications, 147(1):118–128, 2000

    Nikolai M Filatov and Heinz Unbehauen. Survey of adaptive dual control methods.IEE Proceedings-Control Theory and Applications, 147(1):118–128, 2000

  6. [6]

    Nonlinear dual control based on fast moving horizon estimation and model predictive control with an observability constraint

    Emilien Flayac, Girish Nair, and Iman Shames. Nonlinear dual control based on fast moving horizon estimation and model predictive control with an observability constraint. In2021 60th IEEE Conference on Decision and Control (CDC), pages 3030–3035. IEEE, 2021

  7. [7]

    Dual control of an integrator with unknown gain.Computers & Mathematics with Applications, 12(6A), 1986

    Anders Helmersson and Karl Johan ˚Astr¨om. Dual control of an integrator with unknown gain.Computers & Mathematics with Applications, 12(6A), 1986

  8. [8]

    Minimax adaptive estimation for finite sets of linear systems

    Olle Kjellqvist and Anders Rantzer. Minimax adaptive estimation for finite sets of linear systems. In2022 American Control Conference (ACC), pages 260–265. IEEE, 2022

  9. [9]

    A nonlinear dynamical game interpretation of adaptive ℓ2 control: Performance limitations and suboptimal controllers

    A Megretski. A nonlinear dynamical game interpretation of adaptive ℓ2 control: Performance limitations and suboptimal controllers. In Proceedings of 16th International Symposium on Mathematical Theory of Networks and Systems (MTNS2004), Leuven, 2004

  10. [10]

    Stochastic model predictive control with active uncer- tainty learning: A survey on dual control.Annual Reviews in Control, 45:107–117, 2018

    Ali Mesbah. Stochastic model predictive control with active uncer- tainty learning: A survey on dual control.Annual Reviews in Control, 45:107–117, 2018

  11. [11]

    Minimax adaptive control for a finite set of linear systems

    Anders Rantzer. Minimax adaptive control for a finite set of linear systems. InLearning for Dynamics and Control, pages 893–904. PMLR, 2021

  12. [12]

    Minimax optimal adaptive control for systems on cones

    Anders Rantzer. Minimax optimal adaptive control for systems on cones. In2025 IEEE 64th Conference on Decision and Control (CDC), pages 4137–4139. IEEE, 2025

  13. [13]

    On minimax optimal dual control for fully actuated systems

    Anders Rantzer. On minimax optimal dual control for fully actuated systems. In2025 American Control Conference (ACC), 2025

  14. [14]

    The theory and design of robust adaptive controllers.Automatica, pages 19–24, 1987

    J Sun and PA Ioannou. The theory and design of robust adaptive controllers.Automatica, pages 19–24, 1987

  15. [15]

    Examples and counterexamples in finite l2-gain adaptive control

    Glenn Vinnicombe. Examples and counterexamples in finite l2-gain adaptive control. InProceedings of 16th International Symposium on Mathematical Theory of Networks and Systems (MTNS2004), Leuven, 2004

  16. [16]

    Adaptive dual control methods: An overview

    Bj ¨orn Wittenmark. Adaptive dual control methods: An overview. Adaptive Systems in Control and Signal Processing 1995, pages 67– 72, 1995. IV. APPENDIX Proof of Lemma 2. min u max w |x|2 S +|u| 2 R −γ 2|w|2 +|Ax+Bu+w| 2 = min u |x|2 S +|u| 2 R + |Ax+Bu| 2 1−γ −2 =|x| 2 S + |Ax|2 −(|B| 2 + (1−γ −2)R)−1(B⊤Ax)2 1−γ −2 , Define ¯R:= (1−γ −2)R. ThenBis the se...