Recognition: unknown
A Modern Introduction to Online Learning
read the original abstract
In this book, I introduce the basic concepts of Online Learning through the modern view of Online Convex Optimization. Here, online learning refers to the framework of regret minimization under worst-case assumptions. I present first-order and second-order algorithms for online learning with convex losses, in Euclidean and non-Euclidean settings. All the algorithms are clearly presented as instantiation of Online Mirror Descent or Follow-The-Regularized-Leader and their variants. Particular attention is given to the issue of tuning the parameters of the algorithms and learning in unbounded domains, through adaptive and parameter-free online learning algorithms. Non-convex losses are addressed through convex surrogate losses and randomization. The bandit setting is also briefly discussed, touching on the problem of adversarial and stochastic multi-armed bandits. Finally, I also cover advanced topics, including black-box reductions, saddle-point optimization, sequential investment, and non-stationary forms of regret analysis. The book concludes with a selection of applications of online learning to domains far from it, such as generalization theory and concentration inequalities. I tried to maintain an informal, but mathematically serious, tone throughout the book. No prior knowledge of convex analysis is required. Moreover, all the included proofs have been carefully chosen to be as simple and as short as possible. This also means that sometimes I have added one or two additional assumptions, just to simplify the proofs.
This paper has not been read by Pith yet.
Forward citations
Cited by 19 Pith papers
-
Online Learning-to-Defer with Varying Experts
Presents the first online learning-to-defer algorithm with regret bounds O((n + n_e) T^{2/3}) generally and O((n + n_e) sqrt(T)) under low noise for multiclass classification with varying experts.
-
Toward Optimal Regret in Robust Pricing: Decoupling Corruption and Time
A robust variant of binary search achieves regret O(C + log T) for dynamic pricing with known corruption C and O(C + log² T) when unknown.
-
Online Resource Allocation With General Constraints
An algorithm for online resource allocation with budget and general constraints achieves O(sqrt(T)) regret in stochastic and alpha-regret in adversarial regimes with bounded constraint violations.
-
Constrained Contextual Bandits with Adversarial Contexts
A modular reduction from budget-constrained contextual bandits with adversarial contexts to unconstrained bandits via surrogate rewards, yielding improved guarantees and an efficient algorithm based on SquareCB.
-
Online Localized Conformal Prediction
OLCP combines online adaptation with covariate-based localization to achieve valid long-run coverage and narrower intervals than global baselines in non-exchangeable settings.
-
Online Nonstochastic Prediction: Logarithmic Regret via Predictive Online Least Squares
Predictive hints from any stabilizing Luenberger observer make hint residuals uniformly bounded in online least squares, yielding logarithmic regret for nonstochastic prediction despite unbounded trajectories in margi...
-
Single-Period Portfolio Selection via Information Projection
CRRA portfolio selection equals Rényi information projection with the Rényi order matching the relative risk aversion coefficient, yielding a Blahut-Arimoto-style alternating optimizer that needs fewer iterations at l...
-
Single-Period Portfolio Selection via Information Projection
CRRA portfolio selection is equivalent to a Rényi information-projection problem whose order equals the investor's relative risk aversion, yielding an alternating optimization algorithm.
-
Concave Statistical Utility Maximization Bandits via Influence-Function Gradients
A framework for concave distributional utility maximization in stochastic bandits via influence-function stochastic gradients and entropic mirror ascent on the simplex, with regret bounds.
-
FedSEA: Achieving Benefit of Parallelization in Federated Online Learning
FedSEA achieves O(sqrt(T)) regret for smooth convex losses and O(log T) for smooth strongly convex losses in federated online learning under stochastic adversary, with parallelization benefits when temporal heterogene...
-
Gradient-Variation Regret Bounds for Unconstrained Online Learning
Parameter-free algorithms for unconstrained online learning achieve regret bounds of order O(||u|| sqrt(V_T(u)) + L||u||^2 + G^4) for L-smooth convex losses without prior knowledge of ||u||, L or G, with extensions to...
-
Distributed Online Convex Optimization with Compressed Communication: Optimal Regret and Applications
Optimal regret bounds O(δ^{-1/2}√T) for convex and O(δ^{-1} log T) for strongly convex losses are achieved in distributed online convex optimization under compressed communication.
-
Optimistic Dual Averaging Unifies Modern Optimizers
SODA unifies several modern optimizers under optimistic dual averaging and supplies a 1/k decay wrapper that improves performance without weight decay tuning.
-
Online Sharp-Calibrated Bayesian Optimization
OSCBO adaptively balances Gaussian process sharpness and calibration in Bayesian optimization by casting hyperparameter selection as constrained online learning, while preserving sublinear regret bounds.
-
Online Localized Conformal Prediction
OLCP and OLCP-Hedge achieve long-run valid coverage in non-exchangeable online settings with narrower prediction sets by localizing conformal prediction to covariates and selecting bandwidth via online convex optimization.
-
StoSignSGD: Unbiased Structural Stochasticity Fixes SignSGD for Training Large Language Models
StoSignSGD resolves SignSGD divergence on non-smooth objectives via structural stochasticity, matching optimal convex rates and improving non-convex bounds while delivering 1.44-2.14x speedups in FP8 LLM pretraining.
-
A Note on How to Remove the $\ln\ln T$ Term from the Squint Bound
Shifted KT potentials equal a prior change in KT, and this removes the ln ln T factor from Squint's data-independent bound.
-
Revisiting Active Sequential Prediction-Powered Mean Estimation
Non-asymptotic analysis of prediction-powered mean estimation shows that no-regret learning for query probabilities converges to the maximum allowed constant value, independent of covariates.
-
The Bayesian Reflex: Online Learning as the Autonomic Nervous System of Modern and Future AI
The Bayesian reflex unifies online Bayesian learning through belief maintenance, Bayes' theorem updates, and exploration-exploitation balancing, with extensions to climate modeling, time series, prime number discovery...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.