Recognition: unknown
Steady-state Based Approach to Online Non-stochastic Control
Pith reviewed 2026-05-10 04:57 UTC · model grok-4.3
The pith
An algorithm achieves O(sqrt(T)) regret in online non-stochastic control against steady states attainable by affine controllers.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that a Follow-The-Perturbed-Leader style online non-convex optimization procedure, when combined with batching, yields O(sqrt(T)) regret with respect to the set of steady-states attainable under an affine controller, while an approximate solution to each non-convex subproblem is enough to preserve both the regret bound and closed-loop stability despite changing policies.
What carries the argument
The batching method combined with Follow-The-Perturbed-Leader updates, which allows periodic policy changes on non-convex problems while ensuring stability of the linear system.
Load-bearing premise
That an approximate solution to the non-convex subproblem is sufficient to keep the O(sqrt(T)) regret bound and that the batching method maintains stability despite changing policies.
What would settle it
Running the algorithm on a linear system with adversarial disturbances and costs and observing cumulative regret that grows faster than order sqrt(T) would falsify the claimed bound.
Figures
read the original abstract
We study the problem of online non-stochastic control (ONC), which is the control of a linear system under adversarial disturbances and adversarial cost functions, with the aim of minimizing the total cost incurred. A recent line of literature in ONC develops algorithms that enjoy sublinear regret with respect to a benchmark based on the set of steady-states that are attainable by a constant input. In this work, we extend this research direction by giving an algorithm that enjoys $\mathcal{O}(\sqrt{T})$ regret with respect to a richer benchmark set, namely the set of steady-states attainable under an \emph{affine controller}. Since this benchmark substantially broadens the comparison class, it provides significantly stronger performance guarantees. Our proposed algorithm combines a Follow-The-Perturbed-Leader-style online non-convex optimization approach with a batching method that maintains stability despite changing policies. Although our proposed algorithm requires solving non-convex subproblems, we show that an approximate solution to this subproblem is sufficient to ensure $\mathcal{O}(\sqrt{T})$ regret. Furthermore, numerical experiments show that our algorithm enjoys lower total cost and similar computation to existing methods in certain settings.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper studies online non-stochastic control of linear systems under adversarial disturbances and costs. It proposes an FTPL-style algorithm that combines online non-convex optimization with a batching procedure, claiming O(sqrt(T)) regret against the benchmark of steady-states attainable under affine controllers (a richer class than constant inputs). The authors assert that approximate solutions to the non-convex subproblems suffice for the regret bound and that batching preserves stability under policy changes; numerical experiments are included to illustrate practical performance.
Significance. If the central claims hold, the result meaningfully strengthens the benchmark in the ONC literature by moving from constant-input steady-states to affine-controller steady-states, which can yield tighter performance guarantees. The tolerance to approximate subproblem solutions and the batching technique for stability could be practically useful, and the reproducible numerical comparisons provide some empirical grounding.
major comments (2)
- [Abstract / Main Theorem] Abstract and main theorem (presumably Theorem 1 or equivalent): the claim that an approximate solution to the non-convex subproblem suffices for the O(sqrt(T)) regret bound is stated without a visible error-propagation analysis or explicit bound on how the approximation error accumulates over batches and affects the overall regret. This is load-bearing for the central guarantee.
- [Batching Method / Stability Analysis] Batching procedure section: the assertion that batching maintains stability despite changing policies lacks a concrete stability argument (e.g., a uniform bound on the state trajectory or a Lyapunov-style argument across policy switches). Without this, the regret comparison to the affine-controller benchmark cannot be rigorously established.
minor comments (2)
- [Preliminaries] Notation for the affine controller benchmark and the perturbation distribution in the FTPL step should be introduced earlier and used consistently to improve readability.
- [Experiments] The numerical experiments section would benefit from reporting the specific approximation tolerance used in the subproblem solver and its observed effect on total cost.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive feedback on our manuscript. The two major comments identify areas where the presentation of key technical arguments can be strengthened. We address each point below and will revise the manuscript to incorporate the suggested clarifications.
read point-by-point responses
-
Referee: [Abstract / Main Theorem] Abstract and main theorem (presumably Theorem 1 or equivalent): the claim that an approximate solution to the non-convex subproblem suffices for the O(sqrt(T)) regret bound is stated without a visible error-propagation analysis or explicit bound on how the approximation error accumulates over batches and affects the overall regret. This is load-bearing for the central guarantee.
Authors: We appreciate the referee's observation that the error-propagation argument requires greater visibility. While the proof of the main regret theorem bounds the per-batch approximation error and shows that the total additive contribution remains O(sqrt(T)) under our parameter choices, we agree that the accumulation across batches is not stated as a standalone lemma. In the revised manuscript we will insert a new lemma immediately preceding the main theorem that explicitly tracks how an additive epsilon-approximation error per subproblem propagates through the batching procedure and is absorbed into the overall O(sqrt(T)) regret bound. revision: yes
-
Referee: [Batching Method / Stability Analysis] Batching procedure section: the assertion that batching maintains stability despite changing policies lacks a concrete stability argument (e.g., a uniform bound on the state trajectory or a Lyapunov-style argument across policy switches). Without this, the regret comparison to the affine-controller benchmark cannot be rigorously established.
Authors: Thank you for this important comment. The current stability argument relies on the fact that each batch uses a fixed affine policy and that the linear system is stabilizable, but we acknowledge that a uniform bound across policy switches is not written out explicitly. In the revision we will add a short lemma in the batching section that establishes a uniform bound on the state trajectory norm. The argument uses the exponential stability of the closed-loop affine policies together with a sufficiently large batch length to ensure that transient effects from policy switches do not accumulate and that the state remains bounded by a constant independent of T, thereby justifying the regret comparison to the affine-controller benchmark. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper's central claim is an O(sqrt(T)) regret bound for an FTPL-style online optimization algorithm augmented with batching, proven to hold against the benchmark of steady-states attainable by affine controllers, with approximate non-convex subproblem solutions shown to suffice. No load-bearing step reduces by construction to a fitted parameter, self-definition, or self-citation chain; the regret analysis is tied directly to the algorithm's construction and stability arguments rather than re-deriving an input quantity. The abstract and high-level description contain no self-citations invoked as uniqueness theorems or ansatzes, and the derivation remains self-contained against external benchmarks without renaming known results or smuggling assumptions.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The plant is a linear time-invariant system.
- domain assumption Disturbances and cost functions are chosen adversarially but remain bounded.
Reference graph
Works this paper leans on
-
[1]
Online optimal control with affine constraints,
Y . Li, S. Das, and N. Li, “Online optimal control with affine constraints,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 35(10), 2021, pp. 8527–8537
2021
-
[2]
Online Optimization with Memory and Competitive Control,
G. Shi, Y . Lin, S.-J. Chung, Y . Yue, and A. Wierman, “Online Optimization with Memory and Competitive Control,” inAdvances in Neural Information Processing Systems, vol. 33, 2020, pp. 20 636– 20 647
2020
-
[3]
Online Nonstochastic Control with Convex Safety Constraints,
N. Jiang, S. Hutchinson, and M. Alizadeh, “Online Nonstochastic Control with Convex Safety Constraints,” Jan. 2025, arXiv:2501.18039 [math]. [Online]. Available: http://arxiv.org/abs/2501.18039
-
[4]
Online convex optimization for robust control of constrained dynamical systems,
M. Nonhoff, E. Dall’Anese, and M. A. M ¨uller, “Online convex optimization for robust control of constrained dynamical systems,” Nov. 2024, arXiv:2401.04487 [eess]
-
[5]
Online Control with Adversarial Disturbances,
N. Agarwal, B. Bullins, E. Hazan, S. Kakade, and K. Singh, “Online Control with Adversarial Disturbances,” inProceedings of the 36th International Conference on Machine Learning. PMLR, May 2019, pp. 111–119, iSSN: 2640-3498. [Online]. Available: https://proceedings.mlr.press/v97/agarwal19c.html
2019
-
[6]
Introduction to Online Control
E. Hazan and K. Singh, “Introduction to Online Control,” Mar. 2025, arXiv:2211.09619 [cs]. [Online]. Available: http://arxiv.org/abs/2211. 09619
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[7]
Improper Learning for Non-Stochastic Control,
M. Simchowitz, K. Singh, and E. Hazan, “Improper Learning for Non-Stochastic Control,” inProceedings of Thirty Third Conference on Learning Theory. PMLR, July 2020, pp. 3320–3436, iSSN: 2640-3498. [Online]. Available: https://proceedings.mlr.press/v125/ simchowitz20a.html
2020
-
[8]
Adaptive regret for control of time-varying dynamics,
P. Gradu, E. Hazan, and E. Minasyan, “Adaptive regret for control of time-varying dynamics,” inLearning for Dynamics and Control Conference. PMLR, 2023, pp. 560–572
2023
-
[9]
Rate-Optimal Online Convex Optimization in Adaptive Linear Control,
A. B. Cassel, A. Peled-Cohen, and T. Koren, “Rate-Optimal Online Convex Optimization in Adaptive Linear Control,”Advances in Neural Information Processing Systems, vol. 35, pp. 7410–7422, Dec. 2022
2022
-
[10]
Online Control of Unknown Time-Varying Dynamical Systems,
E. Minasyan, P. Gradu, M. Simchowitz, and E. Hazan, “Online Control of Unknown Time-Varying Dynamical Systems,” Feb. 2022, arXiv:2202.07890 [cs]. [Online]. Available: http://arxiv.org/abs/2202. 07890
-
[11]
Bandit linear control,
A. Cassel and T. Koren, “Bandit linear control,”Advances in Neural Information Processing Systems, vol. 33, pp. 8872–8882, 2020
2020
-
[12]
Logarithmic Regret for Online Control,
N. Agarwal, E. Hazan, and K. Singh, “Logarithmic Regret for Online Control,” inAdvances in Neural Information Processing Systems, vol. 32, 2019
2019
-
[13]
Revisiting Regret Benchmarks in Online Non-Stochastic Control,
V . Hebbar and C. Langbort, “Revisiting Regret Benchmarks in Online Non-Stochastic Control,” Apr. 2025, arXiv:2504.16581 [math]. [Online]. Available: http://arxiv.org/abs/2504.16581
-
[14]
Optimization algorithms as robust feedback controllers,
A. Hauswirth, Z. He, S. Bolognani, G. Hug, and F. D ¨orfler, “Optimization algorithms as robust feedback controllers,” Annual Reviews in Control, vol. 57, p. 100941, Jan. 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/ S1367578824000105
2024
-
[15]
Stability of Dynamic Feedback optimization with Applications to Power Systems,
S. Menta, A. Hauswirth, S. Bolognani, G. Hug, and F. D ¨orfler, “Stability of Dynamic Feedback optimization with Applications to Power Systems,” in2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Oct. 2018, pp. 136–143. [Online]. Available: https://ieeexplore.ieee.org/document/ 8635640
2018
-
[16]
Online Optimization as a Feedback Controller: Stability and Tracking,
M. Colombino, E. Dall’Anese, and A. Bernstein, “Online Optimization as a Feedback Controller: Stability and Tracking,” May 2018, arXiv:1805.09877 [math]. [Online]. Available: http://arxiv.org/abs/ 1805.09877
-
[17]
Online Optimal Control with Linear Dy- namics and Predictions: Algorithms and Regret Analysis,
Y . Li, X. Chen, and N. Li, “Online Optimal Control with Linear Dy- namics and Predictions: Algorithms and Regret Analysis,” inAdvances in Neural Information Processing Systems, vol. 32, 2019
2019
-
[18]
Online Linear Quadratic Tracking with Regret Guarantees,
A. Karapetyan, D. Bolliger, A. Tsiamis, E. C. Balta, and J. Lygeros, “Online Linear Quadratic Tracking with Regret Guarantees,”IEEE Control Systems Letters, vol. 7, pp. 3950– 3955, 2023, arXiv:2303.10260 [eess]. [Online]. Available: http: //arxiv.org/abs/2303.10260
-
[19]
M. Nonhoff and M. A. M ¨uller, “An online convex optimization algo- rithm for controlling linear systems with state and input constraints,” in2021 American Control Conference (ACC), May 2021, pp. 2523– 2528, arXiv:2005.11308 [math]
-
[20]
Safe Non-Stochastic Control of Linear Dynamical Systems,
H. Zhou and V . Tzoumas, “Safe Non-Stochastic Control of Linear Dynamical Systems,” Aug. 2023, arXiv:2308.12395 [eess]
-
[21]
Online Linear Quadratic Control,
A. Cohen, A. Hasidim, T. Koren, N. Lazic, Y . Mansour, and K. Tal- war, “Online Linear Quadratic Control,” inProceedings of the 35th International Conference on Machine Learning. PMLR, July 2018, pp. 1029–1038, iSSN: 2640-3498
2018
-
[22]
Online non-convex learning: Follow- ing the perturbed leader is optimal,
A. S. Suggala and P. Netrapalli, “Online non-convex learning: Follow- ing the perturbed leader is optimal,” inAlgorithmic Learning Theory. PMLR, 2020, pp. 845–861
2020
-
[23]
Cesa-Bianchi and G
N. Cesa-Bianchi and G. Lugosi,Prediction, Learning, and Games. Cambridge: Cambridge University Press, 2006
2006
-
[24]
Responding to Promises: No- regret learning against followers with memory,
V . Hebbar and C. Langbort, “Responding to Promises: No- regret learning against followers with memory,” Dec. 2024, arXiv:2410.07457 [cs]. [Online]. Available: http://arxiv.org/abs/2410. 07457
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.