arxiv: 2604.17686 · v1 · submitted 2026-04-20 · 🧮 math.OC · cs.SY· eess.SY

Recognition: unknown

Steady-state Based Approach to Online Non-stochastic Control

Vijeth Hebbar , Spencer Hutchinson , Mahnoosh Alizadeh , C\'edric Langbort

Authors on Pith no claims yet

Pith reviewed 2026-05-10 04:57 UTC · model grok-4.3

classification 🧮 math.OC cs.SYeess.SY

keywords online non-stochastic controlregret minimizationaffine controllerssteady-state benchmarksfollow-the-perturbed-leaderbatching methodlinear systemsadversarial control

0 comments

The pith

An algorithm achieves O(sqrt(T)) regret in online non-stochastic control against steady states attainable by affine controllers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents an algorithm for online control of linear systems facing adversarial disturbances and cost functions. It shows how to compete with the best steady-state performance achievable by any affine controller rather than limiting the benchmark to constant inputs. The approach pairs a Follow-The-Perturbed-Leader method for non-convex optimization with a batching technique that keeps the closed-loop system stable even when the policy is updated over time. If the bound holds, it delivers stronger performance guarantees because the comparison class now includes linear feedback policies. Approximate solutions to the optimization steps still suffice for the regret rate.

Core claim

The authors establish that a Follow-The-Perturbed-Leader style online non-convex optimization procedure, when combined with batching, yields O(sqrt(T)) regret with respect to the set of steady-states attainable under an affine controller, while an approximate solution to each non-convex subproblem is enough to preserve both the regret bound and closed-loop stability despite changing policies.

What carries the argument

The batching method combined with Follow-The-Perturbed-Leader updates, which allows periodic policy changes on non-convex problems while ensuring stability of the linear system.

Load-bearing premise

That an approximate solution to the non-convex subproblem is sufficient to keep the O(sqrt(T)) regret bound and that the batching method maintains stability despite changing policies.

What would settle it

Running the algorithm on a linear system with adversarial disturbances and costs and observing cumulative regret that grows faster than order sqrt(T) would falsify the claimed bound.

Figures

Figures reproduced from arXiv: 2604.17686 by C\'edric Langbort, Mahnoosh Alizadeh, Spencer Hutchinson, Vijeth Hebbar.

**Figure 1.** Figure 1: (Simulation Results) Plots comparing both the performance and the computational cost of running the two approaches: BatchFTPL and DAC. even after discretizing K, each oracle evaluation scans over NK “ 100 stabilizing gains and solves a convex subproblem for each. However, the oracle is invoked only once every H steps. In our experiments, this lower update frequency more than offsets the larger per-call cos… view at source ↗

read the original abstract

We study the problem of online non-stochastic control (ONC), which is the control of a linear system under adversarial disturbances and adversarial cost functions, with the aim of minimizing the total cost incurred. A recent line of literature in ONC develops algorithms that enjoy sublinear regret with respect to a benchmark based on the set of steady-states that are attainable by a constant input. In this work, we extend this research direction by giving an algorithm that enjoys $\mathcal{O}(\sqrt{T})$ regret with respect to a richer benchmark set, namely the set of steady-states attainable under an \emph{affine controller}. Since this benchmark substantially broadens the comparison class, it provides significantly stronger performance guarantees. Our proposed algorithm combines a Follow-The-Perturbed-Leader-style online non-convex optimization approach with a batching method that maintains stability despite changing policies. Although our proposed algorithm requires solving non-convex subproblems, we show that an approximate solution to this subproblem is sufficient to ensure $\mathcal{O}(\sqrt{T})$ regret. Furthermore, numerical experiments show that our algorithm enjoys lower total cost and similar computation to existing methods in certain settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper studies online non-stochastic control of linear systems under adversarial disturbances and costs. It proposes an FTPL-style algorithm that combines online non-convex optimization with a batching procedure, claiming O(sqrt(T)) regret against the benchmark of steady-states attainable under affine controllers (a richer class than constant inputs). The authors assert that approximate solutions to the non-convex subproblems suffice for the regret bound and that batching preserves stability under policy changes; numerical experiments are included to illustrate practical performance.

Significance. If the central claims hold, the result meaningfully strengthens the benchmark in the ONC literature by moving from constant-input steady-states to affine-controller steady-states, which can yield tighter performance guarantees. The tolerance to approximate subproblem solutions and the batching technique for stability could be practically useful, and the reproducible numerical comparisons provide some empirical grounding.

major comments (2)

[Abstract / Main Theorem] Abstract and main theorem (presumably Theorem 1 or equivalent): the claim that an approximate solution to the non-convex subproblem suffices for the O(sqrt(T)) regret bound is stated without a visible error-propagation analysis or explicit bound on how the approximation error accumulates over batches and affects the overall regret. This is load-bearing for the central guarantee.
[Batching Method / Stability Analysis] Batching procedure section: the assertion that batching maintains stability despite changing policies lacks a concrete stability argument (e.g., a uniform bound on the state trajectory or a Lyapunov-style argument across policy switches). Without this, the regret comparison to the affine-controller benchmark cannot be rigorously established.

minor comments (2)

[Preliminaries] Notation for the affine controller benchmark and the perturbation distribution in the FTPL step should be introduced earlier and used consistently to improve readability.
[Experiments] The numerical experiments section would benefit from reporting the specific approximation tolerance used in the subproblem solver and its observed effect on total cost.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on our manuscript. The two major comments identify areas where the presentation of key technical arguments can be strengthened. We address each point below and will revise the manuscript to incorporate the suggested clarifications.

read point-by-point responses

Referee: [Abstract / Main Theorem] Abstract and main theorem (presumably Theorem 1 or equivalent): the claim that an approximate solution to the non-convex subproblem suffices for the O(sqrt(T)) regret bound is stated without a visible error-propagation analysis or explicit bound on how the approximation error accumulates over batches and affects the overall regret. This is load-bearing for the central guarantee.

Authors: We appreciate the referee's observation that the error-propagation argument requires greater visibility. While the proof of the main regret theorem bounds the per-batch approximation error and shows that the total additive contribution remains O(sqrt(T)) under our parameter choices, we agree that the accumulation across batches is not stated as a standalone lemma. In the revised manuscript we will insert a new lemma immediately preceding the main theorem that explicitly tracks how an additive epsilon-approximation error per subproblem propagates through the batching procedure and is absorbed into the overall O(sqrt(T)) regret bound. revision: yes
Referee: [Batching Method / Stability Analysis] Batching procedure section: the assertion that batching maintains stability despite changing policies lacks a concrete stability argument (e.g., a uniform bound on the state trajectory or a Lyapunov-style argument across policy switches). Without this, the regret comparison to the affine-controller benchmark cannot be rigorously established.

Authors: Thank you for this important comment. The current stability argument relies on the fact that each batch uses a fixed affine policy and that the linear system is stabilizable, but we acknowledge that a uniform bound across policy switches is not written out explicitly. In the revision we will add a short lemma in the batching section that establishes a uniform bound on the state trajectory norm. The argument uses the exponential stability of the closed-loop affine policies together with a sufficiently large batch length to ensure that transient effects from policy switches do not accumulate and that the state remains bounded by a constant independent of T, thereby justifying the regret comparison to the affine-controller benchmark. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's central claim is an O(sqrt(T)) regret bound for an FTPL-style online optimization algorithm augmented with batching, proven to hold against the benchmark of steady-states attainable by affine controllers, with approximate non-convex subproblem solutions shown to suffice. No load-bearing step reduces by construction to a fitted parameter, self-definition, or self-citation chain; the regret analysis is tied directly to the algorithm's construction and stability arguments rather than re-deriving an input quantity. The abstract and high-level description contain no self-citations invoked as uniqueness theorems or ansatzes, and the derivation remains self-contained against external benchmarks without renaming known results or smuggling assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The contribution rests on standard domain assumptions from linear control and online optimization; no new free parameters or invented entities are introduced in the abstract.

axioms (2)

domain assumption The plant is a linear time-invariant system.
Standard modeling assumption for ONC problems.
domain assumption Disturbances and cost functions are chosen adversarially but remain bounded.
Core modeling choice that defines the non-stochastic setting.

pith-pipeline@v0.9.0 · 5514 in / 1250 out tokens · 50693 ms · 2026-05-10T04:57:02.914765+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 10 canonical work pages · 1 internal anchor

[1]

Online optimal control with affine constraints,

Y . Li, S. Das, and N. Li, “Online optimal control with affine constraints,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 35(10), 2021, pp. 8527–8537

2021
[2]

Online Optimization with Memory and Competitive Control,

G. Shi, Y . Lin, S.-J. Chung, Y . Yue, and A. Wierman, “Online Optimization with Memory and Competitive Control,” inAdvances in Neural Information Processing Systems, vol. 33, 2020, pp. 20 636– 20 647

2020
[3]

Online Nonstochastic Control with Convex Safety Constraints,

N. Jiang, S. Hutchinson, and M. Alizadeh, “Online Nonstochastic Control with Convex Safety Constraints,” Jan. 2025, arXiv:2501.18039 [math]. [Online]. Available: http://arxiv.org/abs/2501.18039

work page arXiv 2025
[4]

Online convex optimization for robust control of constrained dynamical systems,

M. Nonhoff, E. Dall’Anese, and M. A. M ¨uller, “Online convex optimization for robust control of constrained dynamical systems,” Nov. 2024, arXiv:2401.04487 [eess]

work page arXiv 2024
[5]

Online Control with Adversarial Disturbances,

N. Agarwal, B. Bullins, E. Hazan, S. Kakade, and K. Singh, “Online Control with Adversarial Disturbances,” inProceedings of the 36th International Conference on Machine Learning. PMLR, May 2019, pp. 111–119, iSSN: 2640-3498. [Online]. Available: https://proceedings.mlr.press/v97/agarwal19c.html

2019
[6]

Introduction to Online Control

E. Hazan and K. Singh, “Introduction to Online Control,” Mar. 2025, arXiv:2211.09619 [cs]. [Online]. Available: http://arxiv.org/abs/2211. 09619

work page internal anchor Pith review Pith/arXiv arXiv 2025
[7]

Improper Learning for Non-Stochastic Control,

M. Simchowitz, K. Singh, and E. Hazan, “Improper Learning for Non-Stochastic Control,” inProceedings of Thirty Third Conference on Learning Theory. PMLR, July 2020, pp. 3320–3436, iSSN: 2640-3498. [Online]. Available: https://proceedings.mlr.press/v125/ simchowitz20a.html

2020
[8]

Adaptive regret for control of time-varying dynamics,

P. Gradu, E. Hazan, and E. Minasyan, “Adaptive regret for control of time-varying dynamics,” inLearning for Dynamics and Control Conference. PMLR, 2023, pp. 560–572

2023
[9]

Rate-Optimal Online Convex Optimization in Adaptive Linear Control,

A. B. Cassel, A. Peled-Cohen, and T. Koren, “Rate-Optimal Online Convex Optimization in Adaptive Linear Control,”Advances in Neural Information Processing Systems, vol. 35, pp. 7410–7422, Dec. 2022

2022
[10]

Online Control of Unknown Time-Varying Dynamical Systems,

E. Minasyan, P. Gradu, M. Simchowitz, and E. Hazan, “Online Control of Unknown Time-Varying Dynamical Systems,” Feb. 2022, arXiv:2202.07890 [cs]. [Online]. Available: http://arxiv.org/abs/2202. 07890

work page arXiv 2022
[11]

Bandit linear control,

A. Cassel and T. Koren, “Bandit linear control,”Advances in Neural Information Processing Systems, vol. 33, pp. 8872–8882, 2020

2020
[12]

Logarithmic Regret for Online Control,

N. Agarwal, E. Hazan, and K. Singh, “Logarithmic Regret for Online Control,” inAdvances in Neural Information Processing Systems, vol. 32, 2019

2019
[13]

Revisiting Regret Benchmarks in Online Non-Stochastic Control,

V . Hebbar and C. Langbort, “Revisiting Regret Benchmarks in Online Non-Stochastic Control,” Apr. 2025, arXiv:2504.16581 [math]. [Online]. Available: http://arxiv.org/abs/2504.16581

work page arXiv 2025
[14]

Optimization algorithms as robust feedback controllers,

A. Hauswirth, Z. He, S. Bolognani, G. Hug, and F. D ¨orfler, “Optimization algorithms as robust feedback controllers,” Annual Reviews in Control, vol. 57, p. 100941, Jan. 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/ S1367578824000105

2024
[15]

Stability of Dynamic Feedback optimization with Applications to Power Systems,

S. Menta, A. Hauswirth, S. Bolognani, G. Hug, and F. D ¨orfler, “Stability of Dynamic Feedback optimization with Applications to Power Systems,” in2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Oct. 2018, pp. 136–143. [Online]. Available: https://ieeexplore.ieee.org/document/ 8635640

2018
[16]

Online Optimization as a Feedback Controller: Stability and Tracking,

M. Colombino, E. Dall’Anese, and A. Bernstein, “Online Optimization as a Feedback Controller: Stability and Tracking,” May 2018, arXiv:1805.09877 [math]. [Online]. Available: http://arxiv.org/abs/ 1805.09877

work page arXiv 2018
[17]

Online Optimal Control with Linear Dy- namics and Predictions: Algorithms and Regret Analysis,

Y . Li, X. Chen, and N. Li, “Online Optimal Control with Linear Dy- namics and Predictions: Algorithms and Regret Analysis,” inAdvances in Neural Information Processing Systems, vol. 32, 2019

2019
[18]

Online Linear Quadratic Tracking with Regret Guarantees,

A. Karapetyan, D. Bolliger, A. Tsiamis, E. C. Balta, and J. Lygeros, “Online Linear Quadratic Tracking with Regret Guarantees,”IEEE Control Systems Letters, vol. 7, pp. 3950– 3955, 2023, arXiv:2303.10260 [eess]. [Online]. Available: http: //arxiv.org/abs/2303.10260

work page arXiv 2023
[19]

An online convex optimization algo- rithm for controlling linear systems with state and input constraints,

M. Nonhoff and M. A. M ¨uller, “An online convex optimization algo- rithm for controlling linear systems with state and input constraints,” in2021 American Control Conference (ACC), May 2021, pp. 2523– 2528, arXiv:2005.11308 [math]

work page arXiv 2021
[20]

Safe Non-Stochastic Control of Linear Dynamical Systems,

H. Zhou and V . Tzoumas, “Safe Non-Stochastic Control of Linear Dynamical Systems,” Aug. 2023, arXiv:2308.12395 [eess]

work page arXiv 2023
[21]

Online Linear Quadratic Control,

A. Cohen, A. Hasidim, T. Koren, N. Lazic, Y . Mansour, and K. Tal- war, “Online Linear Quadratic Control,” inProceedings of the 35th International Conference on Machine Learning. PMLR, July 2018, pp. 1029–1038, iSSN: 2640-3498

2018
[22]

Online non-convex learning: Follow- ing the perturbed leader is optimal,

A. S. Suggala and P. Netrapalli, “Online non-convex learning: Follow- ing the perturbed leader is optimal,” inAlgorithmic Learning Theory. PMLR, 2020, pp. 845–861

2020
[23]

Cesa-Bianchi and G

N. Cesa-Bianchi and G. Lugosi,Prediction, Learning, and Games. Cambridge: Cambridge University Press, 2006

2006
[24]

Responding to Promises: No- regret learning against followers with memory,

V . Hebbar and C. Langbort, “Responding to Promises: No- regret learning against followers with memory,” Dec. 2024, arXiv:2410.07457 [cs]. [Online]. Available: http://arxiv.org/abs/2410. 07457

work page arXiv 2024