Recognition: no theorem link
Regret-Oracle Complexity Tradeoffs in Agnostic Online Learning
Pith reviewed 2026-05-11 01:17 UTC · model grok-4.3
The pith
Dynamic pruning of non-realizable label sequences bounds oracle queries to O(T^{d_VC + 1}) while preserving near-optimal regret in agnostic online learning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that an adaptive agnostic-to-realizable reduction, which dynamically prunes non-realizable label sequences on the fly via a weak-consistency oracle and bounds the number of surviving paths by the VC dimension, achieves the same near-optimal expected regret as earlier methods while lowering total oracle queries from doubly exponential in the Littlestone dimension to O(T^{d_VC + 1}).
What carries the argument
The adaptive and dynamic agnostic-to-realizable reduction that actively prunes non-realizable label sequences on the fly, with the count of maintained paths bounded by the VC dimension.
Load-bearing premise
The adaptive pruning of non-realizable label sequences preserves the regret guarantee of the underlying realizable learner without introducing extra error that scales with the time horizon T.
What would settle it
Running the algorithm on a hypothesis class of known VC dimension d and observing that its expected regret exceeds the claimed near-optimal rate by more than a constant factor when the total number of consistency-oracle calls is restricted to O(T^{d+1}) would disprove the claim.
Figures
read the original abstract
Agnostic online learning is classically solved via a reduction to the realizable setting, utilizing Littlestone's Standard Optimal Algorithm (SOA) as a base learner. However, the SOA is computationally intractable to execute even for a single round. To overcome this barrier, recent work in oracle-efficient online learning replaces the SOA with a realizable base learner that accesses the concept class exclusively through an offline empirical risk minimization (ERM) oracle. While such agnostic learners achieve near-optimal expected regret, they suffer from a doubly-exponential oracle complexity of $O\big(T^{2^{O(d_\mathrm{LD})}}\big)$, where $d_\mathrm{LD}$ is the Littlestone dimension and $T$ is the number of rounds. In this work, we significantly improve this oracle complexity while relying on an even weaker primitive: a weak-consistency oracle, which merely decides whether a given labeled dataset is realizable. At the core of our approach is an adaptive and dynamic agnostic-to-realizable reduction that actively prunes non-realizable label sequences on the fly. By using the VC dimension ($d_\mathrm{VC}$) to bound the number of dynamically maintained active paths, our algorithm reduces the total query complexity down to $O(T^{d_\mathrm{VC}+1})$ while perfectly preserving near-optimal expected regret. Crucially, this dynamic pruning also yields a memory reduction over the standard reduction. Furthermore, we formally quantify the regret--oracle complexity tradeoff, providing upper bounds that smoothly interpolate between restricted query budgets and attainable expected regret. We complement these with lower bounds proving that any learner restricted to $Q = o(\sqrt{T})$ queries must suffer an expected regret of $\Omega(T/Q)$.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript develops an adaptive reduction from agnostic online learning to the realizable setting that employs a weak-consistency oracle for on-the-fly pruning of non-realizable label sequences. By invoking the Sauer-Shelah lemma to bound the number of dynamically maintained active paths by the VC-dimension, the algorithm attains O(T^{d_VC+1}) total oracle queries while claiming to preserve near-optimal expected regret; it further supplies smooth upper bounds interpolating between query budget and regret together with a matching lower bound of Omega(T/Q) regret for any learner limited to Q = o(sqrt(T)) queries.
Significance. If the regret-preservation argument holds without hidden additive terms, the work materially advances oracle-efficient agnostic online learning by replacing the doubly-exponential dependence on Littlestone dimension with a polynomial dependence on VC dimension and by introducing a memory-efficient dynamic pruning technique. The explicit regret-oracle tradeoff and the complementary lower bound are valuable contributions that quantify the computational price of agnosticity.
major comments (1)
- [Main algorithm and regret analysis (around the dynamic pruning step and the proof of the upper bound)] The central claim that dynamic pruning via the weak-consistency oracle 'perfectly preserving' near-optimal expected regret must be shown to introduce no extra bias or variance that scales with the number of active paths. The analysis needs to verify that the conditional distribution over surviving realizable paths leaves the minimax regret decomposition identical to the full (unpruned) realizable case; any unaccounted O(T^{d_VC}) accumulation would undermine the stated O(T^{d_VC+1}) query bound.
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive report. The concern regarding potential bias or variance introduced by dynamic pruning is well-taken, and we address it directly below. We believe the existing analysis already establishes preservation of the regret bound, but we will add an explicit clarification paragraph for the revised manuscript.
read point-by-point responses
-
Referee: The central claim that dynamic pruning via the weak-consistency oracle 'perfectly preserving' near-optimal expected regret must be shown to introduce no extra bias or variance that scales with the number of active paths. The analysis needs to verify that the conditional distribution over surviving realizable paths leaves the minimax regret decomposition identical to the full (unpruned) realizable case; any unaccounted O(T^{d_VC}) accumulation would undermine the stated O(T^{d_VC+1}) query bound.
Authors: We appreciate this observation on the core of the regret analysis. The dynamic pruning step invokes the weak-consistency oracle precisely to discard label sequences that admit no consistent hypothesis from the class. Because the oracle returns a correct yes/no answer on realizability, the surviving active paths at any round are exactly the sequences realizable by at least one concept consistent with the observed history. Consequently, the conditional distribution over these paths is identical to the distribution that would arise in the standard (unpruned) realizable online-learning setting restricted to the consistent subclass. The minimax regret decomposition therefore carries over unchanged: the value is still the worst-case expected loss over the remaining consistent hypotheses, with no additive bias term. Variance is controlled by the same concentration inequalities used in the realizable analysis; the polynomial bound on the number of paths (via the Sauer-Shelah lemma) appears only in the query-complexity accounting and does not propagate into the regret bound itself. In particular, no O(T^{d_VC}) accumulation arises in the regret because the overall expectation is taken with respect to the adversary's choice of a single realizable path, and the per-round contribution is bounded independently of the path count. We will insert a short clarifying paragraph immediately after the statement of the regret theorem to make this equivalence explicit. revision: partial
Circularity Check
No circularity: derivation uses external VC-dimension bounds and standard oracle reductions
full rationale
The paper constructs an adaptive pruning algorithm that maintains active label sequences and bounds their number via the Sauer-Shelah lemma applied to the VC dimension of the concept class. The resulting query complexity O(T^{d_VC+1}) and the claim of preserved near-optimal regret follow directly from the definition of the weak-consistency oracle and the standard realizable-to-agnostic reduction; neither quantity is defined in terms of the other, nor is any prediction obtained by fitting a parameter to a subset of the target data. No self-citation is invoked as the sole justification for the core pruning invariant or the regret decomposition. The lower bound on regret for sublinear queries is likewise an independent minimax argument. The derivation is therefore self-contained against external combinatorial facts.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math Finite VC dimension controls the number of consistent label sequences
- domain assumption Weak-consistency oracle correctly identifies realizable datasets
Reference graph
Works this paper leans on
-
[1]
2 Notes on Classes with Vapnik-Chervonenkis Dimension 1 , journal =
Shai Ben. 2 Notes on Classes with Vapnik-Chervonenkis Dimension 1 , journal =. 2015 , url =. 1507.05307 , timestamp =
-
[2]
International Conference on Algorithmic Learning Theory , pages=
On computable online learning , author=. International Conference on Algorithmic Learning Theory , pages=. 2023 , organization=
work page 2023
-
[3]
Proceedings of the 53rd annual ACM SIGACT symposium on theory of computing , pages=
Adversarial laws of large numbers and optimal regret in online classification , author=. Proceedings of the 53rd annual ACM SIGACT symposium on theory of computing , pages=
-
[4]
Proceedings of The 27th Conference on Learning Theory , pages =
Sample Compression for Multi-label Concept Classes , author =. Proceedings of The 27th Conference on Learning Theory , pages =. 2014 , editor =
work page 2014
-
[5]
Advances in Neural Information Processing Systems , volume=
Optimal learners for realizable regression: Pac learning and online learning , author=. Advances in Neural Information Processing Systems , volume=
-
[6]
arXiv preprint arXiv:2403.10889 , year=
List Sample Compression and Uniform Convergence , author=. arXiv preprint arXiv:2403.10889 , year=
-
[7]
arXiv preprint arXiv:1810.01864 , year=
Agnostic sample compression for linear regression , author=. arXiv preprint arXiv:1810.01864 , year=
-
[8]
Journal of Computer and System Sciences , volume=
Unlabeled sample compression schemes and corner peelings for ample and maximum classes , author=. Journal of Computer and System Sciences , volume=. 2022 , publisher=
work page 2022
-
[9]
Journal of the ACM (JACM) , volume=
Oracle-efficient online learning and auction design , author=. Journal of the ACM (JACM) , volume=. 2020 , publisher=
work page 2020
-
[10]
Advances in Neural Information Processing Systems , volume=
A trichotomy for transductive online learning , author=. Advances in Neural Information Processing Systems , volume=
-
[12]
arXiv preprint arXiv:1507.01654 , year=
Rectified Simplex Polytope Numbers , author=. arXiv preprint arXiv:1507.01654 , year=
-
[13]
arXiv preprint arXiv:2411.01634 , year=
Multiclass Transductive Online Learning , author=. arXiv preprint arXiv:2411.01634 , year=
-
[14]
Understanding machine learning: From theory to algorithms , author=. 2014 , publisher=
work page 2014
-
[15]
Advances in Neural Information Processing Systems , volume=
Adversarial resilience in sequential prediction via abstention , author=. Advances in Neural Information Processing Systems , volume=
-
[16]
The Thirty Sixth Annual Conference on Learning Theory , pages=
Online learning and solving infinite games with an erm oracle , author=. The Thirty Sixth Annual Conference on Learning Theory , pages=. 2023 , organization=
work page 2023
-
[17]
Information and Computation , volume=
Predicting \ 0, 1 \ -functions on randomly drawn points , author=. Information and Computation , volume=. 1994 , publisher=
work page 1994
-
[18]
The Thirty Seventh Annual Conference on Learning Theory , pages=
Simple online learning with consistent oracle , author=. The Thirty Seventh Annual Conference on Learning Theory , pages=. 2024 , organization=
work page 2024
-
[19]
The Thirty Seventh Annual Conference on Learning Theory , pages=
On the performance of empirical risk minimization with smoothed data , author=. The Thirty Seventh Annual Conference on Learning Theory , pages=. 2024 , organization=
work page 2024
-
[21]
Theory of Probability and its Applications , volume=
On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities , author=. Theory of Probability and its Applications , volume=. 1971 , publisher=
work page 1971
-
[22]
Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm , author=. Machine learning , volume=. 1988 , publisher=
work page 1988
-
[23]
Online learning versus offline learning , author=. Machine Learning , volume=. 1997 , publisher=
work page 1997
-
[24]
Queries and concept learning , author=. Machine learning , volume=. 1988 , publisher=
work page 1988
-
[25]
arXiv preprint arXiv:2303.17578 , year=
Online learning and disambiguations of partial concept classes , author=. arXiv preprint arXiv:2303.17578 , year=
-
[26]
Advances in Neural Information Processing Systems , volume=
Multiclass learnability beyond the pac framework: Universal rates and partial concept classes , author=. Advances in Neural Information Processing Systems , volume=
-
[27]
International Conference on Machine Learning , pages=
Efficient algorithms for adversarial contextual learning , author=. International Conference on Machine Learning , pages=. 2016 , organization=
work page 2016
-
[28]
The Thirty Sixth Annual Conference on Learning Theory , pages=
Optimal prediction using expert advice and randomized littlestone dimension , author=. The Thirty Sixth Annual Conference on Learning Theory , pages=. 2023 , organization=
work page 2023
-
[29]
The Thirty Seventh Annual Conference on Learning Theory , pages=
Is Efficient PAC Learning Possible with an Oracle That Responds , author=. The Thirty Seventh Annual Conference on Learning Theory , pages=. 2024 , organization=
work page 2024
-
[30]
International conference on machine learning , pages=
Beyond ucb: Optimal and efficient contextual bandits with regression oracles , author=. International conference on machine learning , pages=. 2020 , organization=
work page 2020
-
[31]
arXiv preprint arXiv:2112.13487 , year=
The statistical complexity of interactive decision making , author=. arXiv preprint arXiv:2112.13487 , year=
-
[32]
Proceedings of the forty-eighth annual ACM symposium on Theory of Computing , pages=
The computational power of optimization in online learning , author=. Proceedings of the forty-eighth annual ACM symposium on Theory of Computing , pages=
-
[33]
Advances in Neural Information Processing Systems , volume=
From batch to transductive online learning , author=. Advances in Neural Information Processing Systems , volume=
-
[34]
Journal of Computer and System Sciences , volume=
Efficient algorithms for online decision problems , author=. Journal of Computer and System Sciences , volume=. 2005 , publisher=
work page 2005
-
[35]
Smoothed analysis with adaptive adversaries , author=. Journal of the ACM , volume=. 2024 , publisher=
work page 2024
-
[36]
Empirical Inference: Festschrift in Honor of Vladimir N
Efficient transductive online learning via randomized rounding , author=. Empirical Inference: Festschrift in Honor of Vladimir N. Vapnik , pages=. 2013 , publisher=
work page 2013
-
[37]
On agnostic learning with \ 0,*, 1 \ -valued and real-valued hypotheses , author=. Computational Learning Theory: 14th Annual Conference on Computational Learning Theory, COLT 2001 and 5th European Conference on Computational Learning Theory, EuroCOLT 2001 Amsterdam, The Netherlands, July 16--19, 2001 Proceedings 14 , pages=. 2001 , organization=
work page 2001
-
[38]
2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS) , pages=
A theory of PAC learnability of partial concept classes , author=. 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS) , pages=. 2022 , organization=
work page 2021
- [39]
- [40]
-
[41]
Conference on Learning Theory , pages=
Inapproximability of VC dimension and Littlestone’s dimension , author=. Conference on Learning Theory , pages=. 2017 , organization=
work page 2017
-
[42]
Improved second-order bounds for prediction with expert advice , author=. Machine Learning , volume=. 2007 , publisher=
work page 2007
-
[43]
Journal of Computer and System Sciences , volume=
Adaptive and self-confident on-line learning algorithms , author=. Journal of Computer and System Sciences , volume=. 2002 , publisher=
work page 2002
-
[44]
Online learning and online convex optimization , author=. Foundations and Trends. 2012 , publisher=
work page 2012
- [45]
-
[46]
Adaptive computation and machine learning , author=
Foundations of Machine Learning, ser. Adaptive computation and machine learning , author=. 2012 , publisher=
work page 2012
-
[47]
Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics) , author=. 1982 , publisher=
work page 1982
-
[48]
On teaching and learning intersection-closed concept classes , author=. Computational Learning Theory: 4th European Conference, EuroCOLT’99 Nordkirchen, Germany, March 29--31, 1999 Proceedings 4 , pages=. 1999 , organization=
work page 1999
-
[49]
arXiv preprint arXiv:2512.12567 , year=
Optimal Mistake Bounds for Transductive Online Learning , author=. arXiv preprint arXiv:2512.12567 , year=
-
[50]
Information and Computation , volume=
Optimal mistake bound learning is hard , author=. Information and Computation , volume=. 1998 , publisher=
work page 1998
-
[51]
Mathematics of Operations Research , volume=
Bypassing the monster: A faster and simpler optimal algorithm for contextual bandits under realizability , author=. Mathematics of Operations Research , volume=. 2022 , publisher=
work page 2022
-
[52]
Advances in Neural Information Processing Systems , volume=
A characterization of semi-supervised adversarially robust pac learnability , author=. Advances in Neural Information Processing Systems , volume=
-
[53]
Journal of Machine Learning Research , volume=
Improved generalization bounds for adversarially robust learning , author=. Journal of Machine Learning Research , volume=
-
[54]
Journal of Computer and System Sciences , volume=
Prediction, learning, uniform convergence, and scale-sensitive dimensions , author=. Journal of Computer and System Sciences , volume=. 1998 , publisher=
work page 1998
-
[55]
arXiv preprint arXiv:2211.09101 , year=
Comparative learning: A sample complexity theory for two hypothesis classes , author=. arXiv preprint arXiv:2211.09101 , year=
-
[56]
Advances in Neural Information Processing Systems , volume=
Tradeoffs between Mistakes and ERM Oracle Calls in Online and Transductive Online Learning , author=. Advances in Neural Information Processing Systems , volume=
-
[58]
International conference on machine learning , pages=
Taming the monster: A fast and simple algorithm for contextual bandits , author=. International conference on machine learning , pages=. 2014 , organization=
work page 2014
-
[59]
Online learning and solving infinite games with an erm oracle
Angelos Assos, Idan Attias, Yuval Dagan, Constantinos Daskalakis, and Maxwell K Fishelson. Online learning and solving infinite games with an erm oracle. In The Thirty Sixth Annual Conference on Learning Theory , pages 274--324. PMLR, 2023
work page 2023
-
[60]
Adversarial laws of large numbers and optimal regret in online classification
Noga Alon, Omri Ben-Eliezer, Yuval Dagan, Shay Moran, Moni Naor, and Eylon Yogev. Adversarial laws of large numbers and optimal regret in online classification. In Proceedings of the 53rd annual ACM SIGACT symposium on theory of computing , pages 447--455, 2021
work page 2021
-
[61]
Adaptive and self-confident on-line learning algorithms
Peter Auer, Nicolo Cesa-Bianchi, and Claudio Gentile. Adaptive and self-confident on-line learning algorithms. Journal of Computer and System Sciences , 64(1):48--75, 2002
work page 2002
-
[62]
Taming the monster: A fast and simple algorithm for contextual bandits
Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, and Robert Schapire. Taming the monster: A fast and simple algorithm for contextual bandits. In International conference on machine learning , pages 1638--1646. PMLR, 2014
work page 2014
-
[63]
Tradeoffs between mistakes and erm oracle calls in online and transductive online learning
Idan Attias, Steve Hanneke, and Arvind Ramaswami. Tradeoffs between mistakes and erm oracle calls in online and transductive online learning. Advances in Neural Information Processing Systems , 39, 2025
work page 2025
-
[64]
Shai Ben-David, D \'a vid P \'a l, and Shai Shalev-Shwartz. Agnostic online learning. In COLT , volume 3, page 1, 2009
work page 2009
-
[65]
On the performance of empirical risk minimization with smoothed data
Adam Block, Alexander Rakhlin, and Abhishek Shetty. On the performance of empirical risk minimization with smoothed data. In The Thirty Seventh Annual Conference on Learning Theory , pages 596--629. PMLR, 2024
work page 2024
-
[66]
Prediction, learning, and games
Nicolo Cesa-Bianchi and G \'a bor Lugosi. Prediction, learning, and games . Cambridge university press, 2006
work page 2006
-
[67]
Improved second-order bounds for prediction with expert advice
Nicolo Cesa-Bianchi, Yishay Mansour, and Gilles Stoltz. Improved second-order bounds for prediction with expert advice. Machine Learning , 66(2):321--352, 2007
work page 2007
-
[68]
Is efficient pac learning possible with an oracle that responds
Constantinos Daskalakis and Noah Golowich. Is efficient pac learning possible with an oracle that responds. In The Thirty Seventh Annual Conference on Learning Theory , pages 1263--1307. PMLR, 2024
work page 2024
-
[69]
arXiv preprint arXiv:1106.2369 , year=
Miroslav Dudik, Daniel Hsu, Satyen Kale, Nikos Karampatziakis, John Langford, Lev Reyzin, and Tong Zhang. Efficient optimal learning for contextual bandits. arXiv preprint arXiv:1106.2369 , 2011
-
[70]
Oracle-efficient online learning and auction design
Miroslav Dud \' k, Nika Haghtalab, Haipeng Luo, Robert E Schapire, Vasilis Syrgkanis, and Jennifer Wortman Vaughan. Oracle-efficient online learning and auction design. Journal of the ACM (JACM) , 67(5):1--57, 2020
work page 2020
-
[71]
Optimal mistake bound learning is hard
Moti Frances and Ami Litman. Optimal mistake bound learning is hard. Information and Computation , 144(1):66--82, 1998
work page 1998
-
[72]
Beyond ucb: Optimal and efficient contextual bandits with regression oracles
Dylan Foster and Alexander Rakhlin. Beyond ucb: Optimal and efficient contextual bandits with regression oracles. In International conference on machine learning , pages 3199--3210. PMLR, 2020
work page 2020
-
[73]
Niki Hasrati and Shai Ben-David. On computable online learning. In International Conference on Algorithmic Learning Theory , pages 707--725. PMLR, 2023
work page 2023
-
[74]
Oracle-efficient online learning for beyond worst-case adversaries
Nika Haghtalab, Yanjun Han, Abhishek Shetty, and Kunhe Yang. Oracle-efficient online learning for beyond worst-case adversaries. arXiv preprint arXiv:2202.08549 , 2022
-
[75]
The computational power of optimization in online learning
Elad Hazan and Tomer Koren. The computational power of optimization in online learning. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing , pages 128--141, 2016
work page 2016
-
[76]
Smoothed analysis with adaptive adversaries
Nika Haghtalab, Tim Roughgarden, and Abhishek Shetty. Smoothed analysis with adaptive adversaries. Journal of the ACM , 71(3):1--34, 2024
work page 2024
-
[77]
Simple online learning with consistent oracle
Alexander Kozachinskiy and Tomasz Steifer. Simple online learning with consistent oracle. In The Thirty Seventh Annual Conference on Learning Theory , pages 3241--3256. PMLR, 2024
work page 2024
-
[78]
Efficient algorithms for online decision problems
Adam Kalai and Santosh Vempala. Efficient algorithms for online decision problems. Journal of Computer and System Sciences , 71(3):291--307, 2005
work page 2005
-
[79]
Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm
Nick Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine learning , 2:285--318, 1988
work page 1988
-
[80]
Improved inapproximability of vc dimension and littlestone's dimension via (unbalanced) biclique
Pasin Manurangsi. Improved inapproximability of vc dimension and littlestone's dimension via (unbalanced) biclique. arXiv preprint arXiv:2211.01443 , 2022
-
[81]
Inapproximability of vc dimension and littlestone’s dimension
Pasin Manurangsi and Aviad Rubinstein. Inapproximability of vc dimension and littlestone’s dimension. In Conference on Learning Theory , pages 1432--1460. PMLR, 2017
work page 2017
-
[82]
Foundations of machine learning, ser
Mehryar Mohri, A Rostamizadeh, and A Talwalkar. Foundations of machine learning, ser. adaptive computation and machine learning, 2012
work page 2012
-
[83]
Efficient algorithms for adversarial contextual learning
Vasilis Syrgkanis, Akshay Krishnamurthy, and Robert Schapire. Efficient algorithms for adversarial contextual learning. In International Conference on Machine Learning , pages 2159--2168. PMLR, 2016
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.