Recognition: unknown
Gradient-Variation Regret Bounds for Unconstrained Online Learning
Pith reviewed 2026-05-10 16:19 UTC · model grok-4.3
The pith
Fully adaptive algorithms achieve regret scaling with gradient variation for unconstrained online learning with smooth convex losses.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For L-smooth convex loss functions the authors give fully-adaptive algorithms whose regret is bounded by tilde O of ||u|| sqrt(V_T(u)) plus L ||u|| squared plus G to the fourth, where V_T(u) is the sum of squared differences between consecutive gradients at the unknown comparator u, and the algorithms require no prior knowledge of ||u||, G, or L.
What carries the argument
The gradient variation measure V_T(u) = sum from t=2 to T of the squared Euclidean norm of the difference between consecutive gradients evaluated at the comparator u, which replaces worst-case Lipschitz assumptions in the regret analysis.
If this is right
- Regret automatically becomes small when consecutive losses have similar gradients at the comparator, without any manual tuning.
- The same algorithms apply directly to dynamic regret settings with changing comparators.
- The SEA model obtains strictly better regret than the prior best result by substituting the new variation-based bound.
- No separate tuning stage or doubling trick is needed because the updates are closed-form and parameter-free.
Where Pith is reading between the lines
- The same variation-based analysis could be applied to non-smooth convex losses by replacing the L term with an appropriate diameter or diameter-like quantity.
- In stochastic settings the gradient variation term may concentrate around its expectation, yielding high-probability bounds that improve with sample size.
- The closed-form update suggests the method could be combined with projection-free or bandit feedback extensions without losing the parameter-free property.
Load-bearing premise
The loss functions must be convex and L-smooth for the stated regret bound to hold.
What would settle it
Run the proposed algorithm on a sequence of L-smooth convex losses where the realized gradient variation at the comparator is small; if the cumulative regret grows faster than the claimed order, the bound is falsified.
read the original abstract
We develop parameter-free algorithms for unconstrained online learning with regret guarantees that scale with the gradient variation $V_T(u) = \sum_{t=2}^T \|\nabla f_t(u)-\nabla f_{t-1}(u)\|^2$. For $L$-smooth convex loss, we provide fully-adaptive algorithms achieving regret of order $\widetilde{O}(\|u\|\sqrt{V_T(u)} + L\|u\|^2+G^4)$ without requiring prior knowledge of comparator norm $\|u\|$, Lipschitz constant $G$, or smoothness $L$. The update in each round can be computed efficiently via a closed-form expression. Our results extend to dynamic regret and find immediate implications to the stochastically-extended adversarial (SEA) model, which significantly improves upon the previous best-known result [Wang et al., 2025].
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops parameter-free algorithms for unconstrained online convex optimization under L-smooth convex losses. It claims fully adaptive regret bounds of order Õ(‖u‖√V_T(u) + L‖u‖² + G⁴) that require no prior knowledge of the comparator norm ‖u‖, Lipschitz constant G, or smoothness L, with per-round updates given in closed form. The results are extended to dynamic regret and yield improved bounds in the stochastically-extended adversarial (SEA) model relative to Wang et al. (2025).
Significance. If the derivations hold, the work would advance adaptive online learning by supplying the first fully parameter-free regret bounds that scale directly with the comparator-dependent gradient variation V_T(u) rather than cruder measures such as total variation or path length. The closed-form update and the SEA-model improvement are concrete strengths; the additive G⁴ term is consistent with known adaptation costs in prior parameter-free analyses.
major comments (2)
- [Abstract and §3] Abstract and §3 (algorithm statement): the claimed regret bound includes an additive G⁴ term whose independence from the unknown G must be verified explicitly; the abstract supplies neither a proof sketch nor an error analysis showing that this term does not implicitly re-introduce knowledge of G, which is load-bearing for the “fully-adaptive” claim.
- [§4] §4 (proof of Theorem 1): the reduction from the unconstrained problem to a constrained one via the closed-form update must be shown to preserve the exact dependence on V_T(u) without hidden factors of ‖u‖ or L; any self-referential dependence would undermine the parameter-free guarantee.
minor comments (2)
- Notation: ensure that the definition of V_T(u) in the main text matches the abstract exactly and that the tilde-O hides only polylog factors independent of the unknown parameters.
- Related work: the comparison to Wang et al. (2025) in the SEA setting should include a brief table or paragraph quantifying the improvement in the dependence on the stochasticity parameter.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and detailed report. We address the two major comments below with clarifications on the parameter-free nature of the algorithm and analysis. We will incorporate revisions to improve the exposition as indicated.
read point-by-point responses
-
Referee: [Abstract and §3] Abstract and §3 (algorithm statement): the claimed regret bound includes an additive G⁴ term whose independence from the unknown G must be verified explicitly; the abstract supplies neither a proof sketch nor an error analysis showing that this term does not implicitly re-introduce knowledge of G, which is load-bearing for the “fully-adaptive” claim.
Authors: The algorithm is fully parameter-free: its closed-form per-round update is computed without any input or estimate of G (or L or ||u||). The G⁴ term is an additive adaptation cost that appears in the regret analysis after the fact and depends only on the realized value of G; it does not affect the algorithm's execution or require prior knowledge of G. This structure is standard in the parameter-free literature. We will revise the abstract to note this distinction briefly and add a short paragraph in §3 that sketches how the G⁴ term is obtained from the online tuning of the learning rate without referencing G in the update rule. revision: partial
-
Referee: [§4] §4 (proof of Theorem 1): the reduction from the unconstrained problem to a constrained one via the closed-form update must be shown to preserve the exact dependence on V_T(u) without hidden factors of ‖u‖ or L; any self-referential dependence would undermine the parameter-free guarantee.
Authors: The reduction in the proof of Theorem 1 uses an adaptively chosen ball whose radius is updated online from observed gradients alone. We explicitly track the error introduced by the projection step and show that it contributes only to the L||u||² term; the coefficient of √V_T(u) remains exactly ||u|| with no additional multiplicative factors involving ||u|| or L. The tuning of the radius is self-contained and does not create circular dependence on the unknown quantities. To make this transparent, we will expand the relevant paragraph in §4 with the intermediate inequalities that isolate the V_T(u) term. revision: yes
Circularity Check
No significant circularity; derivation is self-contained
full rationale
The paper presents algorithmic constructions for parameter-free online convex optimization under L-smooth losses, with regret bounds expressed in terms of the comparator-dependent gradient variation V_T(u). No equations or steps in the provided abstract reduce the claimed regret or algorithm to a fitted parameter, self-referential definition, or load-bearing self-citation. The result is framed as an existence claim for closed-form updates that adapt without prior knowledge of ||u||, G, or L, consistent with standard techniques in online learning (e.g., potential-based or doubling methods) that do not presuppose the target bound. The extension to dynamic regret and SEA model is presented as an implication rather than a circular re-derivation. No load-bearing step matches any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Loss functions are convex and L-smooth
Reference graph
Works this paper leans on
-
[1]
Online label shift: Optimal dynamic regret meets practical algorithms
Dheeraj Baby, Saurabh Garg, Tzu-Ching Yen, Sivaraman Balakrishnan, Zachary Chase Lipton, and Yu-Xiang Wang. Online label shift: Optimal dynamic regret meets practical algorithms. In Advances in Neural Information Processing Systems 36 (NeurIPS), pages 65703--65742, 2023
2023
-
[2]
Adapting to online label shift with provable guarantees
Yong Bai, Yu-Jie Zhang, Peng Zhao, Masashi Sugiyama, and Zhi-Hua Zhou. Adapting to online label shift with provable guarantees. In Advances in Neural Information Processing Systems 35 (NeurIPS), pages 29960--29974, 2022
2022
-
[3]
Online learning with imperfect hints
Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, and Manish Purohit. Online learning with imperfect hints. In Proceedings of the 37th International Conference on Machine Learning (ICML), pages 822--831, 2020 a
2020
-
[4]
Online linear optimization with many hints
Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, and Manish Purohit. Online linear optimization with many hints. In Advances in Neural Information Processing Systems 33 (NeurIPS), pages 9530--9539, 2020 b
2020
-
[5]
Logarithmic regret from sublinear hints
Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, and Manish Purohit. Logarithmic regret from sublinear hints. In Advances in Neural Information Processing Systems 34 (NeurIPS), pages 10148--10201, 2021
2021
-
[6]
Prediction, L earning, and G ames
Nicol\` o Cesa-Bianchi and G \'a bor Lugosi. Prediction, L earning, and G ames . Cambridge U niversity P ress, 2006
2006
-
[7]
A parameter-free hedging algorithm
Kamalika Chaudhuri, Yoav Freund, and Daniel J Hsu. A parameter-free hedging algorithm. pages 297--305, 2009
2009
-
[8]
Optimistic online mirror descent for bridging stochastic and adversarial online convex optimization
Sijia Chen, Yu-Jie Zhang, Wei-Wei Tu, Peng Zhao, and Lijun Zhang. Optimistic online mirror descent for bridging stochastic and adversarial online convex optimization. Journal of Machine Learning Research, 25 0 (178): 0 1--62, 2024
2024
-
[9]
Online optimization with gradual variations
Chao-Kai Chiang, Tianbao Yang, Chia-Jung Lee, Mehrdad Mahdavi, Chi-Jen Lu, Rong Jin, and Shenghuo Zhu. Online optimization with gradual variations. In Proceedings of the 25th Conference on Learning Theory (COLT), pages 6.1--6.20, 2012
2012
-
[10]
Anytime online-to-batch, optimism and acceleration
Ashok Cutkosky. Anytime online-to-batch, optimism and acceleration. In Proceedings of the 36th International Conference on Machine Learning (ICML), pages 1446--1454, 2019 a
2019
-
[11]
Artificial constraints and hints for unbounded online learning
Ashok Cutkosky. Artificial constraints and hints for unbounded online learning. In Proceedings of the 32nd Conference on Learning Theory (COLT), pages 874--894, 2019 b
2019
-
[12]
Combining online learning guarantees
Ashok Cutkosky. Combining online learning guarantees. In Proceedings of the 32nd Conference on Learning Theory (COLT), pages 895--913, 2019 c
2019
-
[13]
Parameter-free, dynamic, and strongly-adaptive online learning
Ashok Cutkosky. Parameter-free, dynamic, and strongly-adaptive online learning. In Proceedings of the 37th International Conference on Machine Learning (ICML), pages 2250--2259, 2020
2020
-
[14]
Fully unconstrained online learning
Ashok Cutkosky and Zakaria Mhammedi. Fully unconstrained online learning. In Advances in Neural Information Processing Systems 37 (NeurIPS), pages 10148--10201, 2024
2024
-
[15]
Black-box reductions for parameter-free online learning in banach spaces
Ashok Cutkosky and Francesco Orabona. Black-box reductions for parameter-free online learning in banach spaces. In Proceedings of the 31st Conference on Learning Theory (COLT), pages 1493--1529, 2018
2018
-
[16]
Gr \" u nwald, and Wouter M
Steven de Rooij, Tim van Erven, Peter D. Gr \" u nwald, and Wouter M. Koolen. Follow the leader if you can, hedge if you must. Journal of Machine Learning Research, 15 0 (1): 0 1281--1316, 2014
2014
-
[17]
Foster, Alexander Rakhlin, and Karthik Sridharan
Dylan J. Foster, Alexander Rakhlin, and Karthik Sridharan. Adaptive online learning. In Advances in Neural Information Processing Systems 28 (NIPS), pages 3375--3383, 2015
2015
-
[18]
Introduction to O nline C onvex O ptimization
Elad Hazan. Introduction to O nline C onvex O ptimization . MIT Press, 2nd edition, 2022
2022
-
[19]
Parameter-free mirror descent
Andrew Jacobsen and Ashok Cutkosky. Parameter-free mirror descent. In Proceedings of the 35th Conference on Learning Theory (COLT), pages 4160--4211, 2022
2022
-
[20]
Unconstrained online learning with unbounded losses
Andrew Jacobsen and Ashok Cutkosky. Unconstrained online learning with unbounded losses. In Proceedings of the 40th International Conference on Machine Learning (ICML), pages 14590--14630, 2023
2023
-
[21]
A modular analysis of adaptive (non-)convex optimization: Optimism, composite objectives, variance reduction, and variational bounds
Pooria Joulani, Andr \' a s Gy \" o rgy, and Csaba Szepesv \' a ri. A modular analysis of adaptive (non-)convex optimization: Optimism, composite objectives, variance reduction, and variational bounds. Theoretical Computer Science, 808: 0 108--138, 2020
2020
-
[22]
Online to offline conversions, universality and adaptive minibatch sizes
Kfir Levy. Online to offline conversions, universality and adaptive minibatch sizes. In Advances in Neural Information Processing Systems 30 (NIPS), pages 1613--1622, 2017
2017
-
[23]
On the dynamic regret of following the regularized leader: Optimism with history pruning
Naram Mhaisen and George Iosifidis. On the dynamic regret of following the regularized leader: Optimism with history pruning. In Proceedings of the 42nd International Conference on Machine Learning (ICML), pages 43990--44016, 2025
2025
-
[24]
Zakaria Mhammedi and Wouter M. Koolen. Lipschitz and comparator-norm adaptivity in online learning. In Proceedings of 33rd Conference on Learning Theory (COLT), pages 2858--2887, 2020
2020
-
[25]
Lectures on C onvex O ptimization , volume 137
Yurii Nesterov. Lectures on C onvex O ptimization , volume 137. Springer, 2018
2018
-
[26]
Dimension-free exponentiated gradient
Francesco Orabona. Dimension-free exponentiated gradient. In C.J. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc., 2013
2013
-
[27]
A Modern Introduction to Online Learning
Francesco Orabona. A M odern I ntroduction to O nline L earning. ArXiv preprint, arxiv:1912.13213, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1912
-
[28]
Handling new class in online label shift
Yu-Yang Qian, Yong Bai, Zhen-Yu Zhang, Peng Zhao, and Zhi-Hua Zhou. Handling new class in online label shift. In Proceedings of the 23rd IEEE International Conference on Data Mining (ICDM), pages 1283--1288, 2023
2023
-
[29]
Beyond the Worst-Case Analysis of Algorithms
Tim Roughgarden. Beyond the Worst-Case Analysis of Algorithms. Cambridge University Press, 2021
2021
-
[30]
Between stochastic and adversarial online convex optimization: Improved regret bounds via smoothness
Sarah Sachs, Hedi Hadiji, Tim van Erven, and Crist \'o bal A Guzm \'a n. Between stochastic and adversarial online convex optimization: Improved regret bounds via smoothness. In Advances in Neural Information Processing Systems 35 (NeurIPS), pages 691--702, 2022
2022
-
[31]
Online L earning and O nline C onvex O ptimization
Shai Shalev-Shwartz. Online L earning and O nline C onvex O ptimization. Foundations and Trends in Machine Learning, 4 0 (2): 0 107--194, 2012
2012
-
[32]
Schapire
Vasilis Syrgkanis, Alekh Agarwal, Haipeng Luo, and Robert E. Schapire. Fast convergence of regularized learning in games. In Advances in Neural Information Processing Systems 28 (NIPS), pages 2989--2997, 2015
2015
-
[33]
Shuche Wang, Adarsh Barik, Peng Zhao, and Vincent Y. F. Tan. Parameter-free algorithms for the stochastically extended adversarial model. In Advances in Neural Information Processing Systems 38 (NeurIPS), page to appear, 2025
2025
-
[34]
Gradient-variation online learning under generalized smoothness
Yan-Feng Xie, Peng Zhao, and Zhi-Hua Zhou. Gradient-variation online learning under generalized smoothness. In Advances in Neural Information Processing Systems 37 (NeurIPS), pages 37865--37899, 2024
2024
-
[35]
A simple and optimal approach for universal online learning with gradient variations
Yu-Hu Yan, Peng Zhao, and Zhi-Hua Zhou. A simple and optimal approach for universal online learning with gradient variations. In Advances in Neural Information Processing Systems 37 (NeurIPS), pages 11132--11163, 2024
2024
-
[36]
Regret bounded by gradual variation for online convex optimization
Tianbao Yang, Mehrdad Mahdavi, Rong Jin, and Shenghuo Zhu. Regret bounded by gradual variation for online convex optimization. Machine Learning, 95 0 (2): 0 183--223, 2014
2014
-
[37]
Improved dimension dependence for bandit convex optimization with gradient variations
Hang Yu, Yu-Hu Yan, and Peng Zhao. Improved dimension dependence for bandit convex optimization with gradient variations. ArXiv preprint, arxiv:2602.04761, 2026
-
[38]
Adaptive online learning in dynamic environments
Lijun Zhang, Shiyin Lu, and Zhi-Hua Zhou. Adaptive online learning in dynamic environments. In Advances in Neural Information Processing Systems 31 (NeurIPS), pages 1330--1340, 2018
2018
-
[39]
No-regret learning in time-varying zero-sum games
Mengxiao Zhang, Peng Zhao, Haipeng Luo, and Zhi-Hua Zhou. No-regret learning in time-varying zero-sum games. In Proceedings of the 39th International Conference on Machine Learning (ICML), pages 26772--26808, 2022
2022
-
[40]
Dynamic regret of convex and smooth functions
Peng Zhao, Yu-Jie Zhang, Lijun Zhang, and Zhi-Hua Zhou. Dynamic regret of convex and smooth functions. In Advances in Neural Information Processing Systems 33 (NeurIPS), pages 12510--12520, 2020
2020
-
[41]
Adaptivity and non-stationarity: Problem-dependent dynamic regret for online convex optimization
Peng Zhao, Yu-Jie Zhang, Lijun Zhang, and Zhi-Hua Zhou. Adaptivity and non-stationarity: Problem-dependent dynamic regret for online convex optimization. Journal of Machine Learning Research, 25 0 (98): 0 1--52, 2024
2024
-
[42]
Efficient methods for non-stationary online learning
Peng Zhao, Yan-Feng Xie, Lijun Zhang, and Zhi-Hua Zhou. Efficient methods for non-stationary online learning. Journal of Machine Learning Research, 26 0 (208): 0 1--66, 2025 a
2025
-
[43]
Adaptivity and universality: Problem-dependent universal regret for online convex optimization
Peng Zhao, Yu-Hu Yan, Hang Yu, and Zhi-Hua Zhou. Adaptivity and universality: Problem-dependent universal regret for online convex optimization. ArXiv preprint, arxiv:2511.19937, 2025 b
-
[44]
Gradient-variation online adaptivity for accelerated optimization with H ölder smoothness
Yuheng Zhao, Yu-Hu Yan, Kfir Yehuda Levy, and Peng Zhao. Gradient-variation online adaptivity for accelerated optimization with H ölder smoothness. In Advances in Neural Information Processing Systems 38 (NeurIPS), page to appear, 2025 c
2025
-
[45]
Online convex programming and generalized infinitesimal gradient ascent
Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning (ICML), pages 928--936, 2003
2003
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.