pith. sign in

arxiv: 2605.03823 · v2 · submitted 2026-05-05 · 💻 cs.LG · cs.IT· math.IT· math.ST· stat.TH

Realizable Bayes-Consistency for General Metric Losses

Pith reviewed 2026-05-15 06:55 UTC · model grok-4.3

classification 💻 cs.LG cs.ITmath.ITmath.STstat.TH
keywords realizable Bayes-consistencymetric lossesLittlestone treeshypothesis classesdistribution-free learningcombinatorial characterization
0
0 comments X

The pith

A hypothesis class permits strong realizable Bayes-consistency for any metric loss if and only if it contains no infinite non-decreasing (gamma_k)-Littlestone tree with gamma_k tending to infinity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper determines the exact conditions on a hypothesis class H that guarantee the existence of a distribution-free learning rule whose risk converges almost surely to the optimal risk of zero whenever the data is realizable by some member of H. This holds for general metric losses on the label space, which may be unbounded. A sympathetic reader cares because the result supplies a sharp combinatorial test that separates classes where universal consistency is possible from those where no such rule exists, without any distributional assumptions on the data. The characterization extends earlier results for 0-1 classification and real-valued regression by replacing the classical Littlestone dimension with a scaled, non-decreasing tree obstruction.

Core claim

For any hypothesis class H, there exists a distribution-free learning rule whose risk converges almost surely to the best-in-class risk of zero for every realizable data-generating distribution if and only if H does not contain an infinite non-decreasing (gamma_k)-Littlestone tree with gamma_k tending to infinity.

What carries the argument

The infinite non-decreasing (gamma_k)-Littlestone tree, a combinatorial structure that forces the loss to grow unboundedly along some sequence of instances and thereby blocks any consistent learning rule.

If this is right

  • Consistency is possible precisely when the hypothesis class avoids all infinite non-decreasing (gamma_k)-Littlestone trees.
  • The result applies to any metric loss, including unbounded ones, on arbitrary label spaces.
  • The same learning rule works uniformly across all realizable distributions without needing to know the distribution in advance.
  • The characterization is both necessary and sufficient, resolving the realizable case of the open problem stated in prior work.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The tree condition supplies a concrete test that can be applied to specific families of hypothesis classes to decide consistency in advance.
  • The same obstruction may appear in related settings such as online learning or agnostic consistency when losses are metric.
  • Extending the result to non-metric losses would likely require identifying different combinatorial structures that play the role of the scaled Littlestone tree.

Load-bearing premise

The loss must form a metric on the label space, and the only obstruction to consistency must be exactly the existence of an infinite non-decreasing (gamma_k)-Littlestone tree with gamma_k going to infinity.

What would settle it

Exhibit a concrete hypothesis class that contains an infinite non-decreasing (gamma_k)-Littlestone tree yet still admits a distribution-free rule achieving almost-sure convergence to zero risk on every realizable distribution, or exhibit one without such a tree that fails to do so.

read the original abstract

We study strong universal Bayes-consistency in the realizable setting for learning with general metric losses, extending classical characterizations beyond $0$-$1$ classification (Bousquet et al., 2020; Hanneke et al., 2021) and real-valued regression (Attias et al., 2024). Given an instance space $(X,\rho)$, a label space $(Y,\ell)$ with possibly unbounded loss, and a hypothesis class $H \subseteq Y^{X}$, we resolve the realizable case of an open problem presented in Tsir Cohen and Kontorovich (2022). Specifically, we find the necessary and sufficient conditions on the hypothesis class $H$ under which there exists a distribution-free learning rule whose risk converges almost surely to the best-in-class risk (which is zero) for every realizable data-generating distribution. Our main contribution is this sharp characterization in terms of a combinatorial obstruction: Similarly to Attias et al. (2024), we introduce the notion of an infinite non-decreasing $(\gamma_k)$-Littlestone tree, where $\gamma_k \to \infty$. This extends the Littlestone tree structure used in Bousquet et al. (2020) to the metric loss setting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper resolves the realizable case of an open problem on strong universal Bayes-consistency for general metric losses. It gives necessary and sufficient conditions on a hypothesis class H ⊆ Y^X such that there exists a distribution-free learning rule whose risk converges almost surely to the best-in-class risk (zero) for every realizable data-generating distribution. The characterization is expressed via the absence of an infinite non-decreasing (γ_k)-Littlestone tree with γ_k → ∞, extending the combinatorial tools from 0-1 classification and real-valued regression to the metric-loss setting.

Significance. If the central characterization holds, the result is a significant advance: it supplies the first sharp, distribution-free necessary-and-sufficient condition for strong realizable consistency under general (possibly unbounded) metric losses. The introduction of the non-decreasing (γ_k)-Littlestone tree is a clean combinatorial generalization that unifies prior work and directly answers the question posed in Tsir Cohen & Kontorovich (2022). The paper thereby provides both a theoretical closure and a concrete obstruction that can be checked for concrete hypothesis classes.

major comments (2)
  1. [§4, Theorem 4.1] §4, Theorem 4.1 (sufficiency): the construction of the distribution-free rule is only sketched at a high level; an explicit description (or pseudocode) of how the rule uses the tree-free assumption to achieve a.s. convergence would make the argument self-contained and easier to verify.
  2. [§3.2, Definition 3.3] §3.2, Definition 3.3: the necessity direction claims that any infinite non-decreasing (γ_k)-Littlestone tree with γ_k → ∞ blocks every distribution-free rule; the proof relies on the metric property of ℓ, but it is not shown whether the triangle inequality is essential or whether the same obstruction holds for arbitrary losses.
minor comments (2)
  1. [§2] The notation for the instance metric ρ and label metric ℓ is introduced in §2 but reused without re-statement in later sections; a short reminder of the domains would improve readability.
  2. [Figure 1] Figure 1 (illustration of the (γ_k)-tree) does not annotate the successive γ_k values on the branches; adding these labels would clarify the non-decreasing condition.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment, the recommendation of minor revision, and the constructive comments. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [§4, Theorem 4.1] §4, Theorem 4.1 (sufficiency): the construction of the distribution-free rule is only sketched at a high level; an explicit description (or pseudocode) of how the rule uses the tree-free assumption to achieve a.s. convergence would make the argument self-contained and easier to verify.

    Authors: We agree that the sufficiency argument in Theorem 4.1 would benefit from greater explicitness. In the revised version we will supply a detailed algorithmic description together with pseudocode that shows precisely how the absence of an infinite non-decreasing (γ_k)-Littlestone tree is used to construct the learning rule and to guarantee almost-sure convergence of the risk to the best-in-class value of zero. revision: yes

  2. Referee: [§3.2, Definition 3.3] §3.2, Definition 3.3: the necessity direction claims that any infinite non-decreasing (γ_k)-Littlestone tree with γ_k → ∞ blocks every distribution-free rule; the proof relies on the metric property of ℓ, but it is not shown whether the triangle inequality is essential or whether the same obstruction holds for arbitrary losses.

    Authors: The necessity proof is developed specifically for metric losses and makes essential use of the triangle inequality when showing that an infinite non-decreasing (γ_k)-Littlestone tree prevents every distribution-free rule from achieving strong consistency. We will add a clarifying remark that highlights this dependence on the metric property and notes that the same combinatorial obstruction need not apply to non-metric losses; the result is stated and proved for the metric setting posed in the open problem. revision: yes

Circularity Check

0 steps flagged

No significant circularity; new combinatorial characterization is self-contained

full rationale

The paper defines a novel obstruction—an infinite non-decreasing (γ_k)-Littlestone tree with γ_k → ∞—and proves it is necessary and sufficient for the non-existence of a distribution-free rule achieving a.s. convergence to zero realizable risk under metric losses. Both directions of the characterization are derived directly from this definition via standard combinatorial arguments that extend (but do not presuppose) the 0-1 and regression cases. No load-bearing step reduces to a fitted parameter, self-citation chain, or renaming of a prior result; the 2022 open-problem citation merely states the question being solved, not the solution itself. The derivation is therefore independent of its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the new combinatorial definition of the (γ_k)-Littlestone tree and standard background from probability and metric spaces; no free parameters or invented physical entities are introduced.

axioms (2)
  • standard math Standard axioms of probability spaces and almost-sure convergence
    Invoked when stating that risk converges almost surely to zero for every realizable distribution.
  • domain assumption The label space (Y, ℓ) forms a metric space, possibly with unbounded metric
    Required to extend the loss beyond 0-1 and squared error.
invented entities (1)
  • infinite non-decreasing (γ_k)-Littlestone tree no independent evidence
    purpose: Combinatorial obstruction that exactly characterizes when Bayes-consistency fails
    New structure introduced to capture the metric-loss case; no independent evidence outside the paper is provided.

pith-pipeline@v0.9.0 · 5532 in / 1443 out tokens · 36732 ms · 2026-05-15T06:55:04.735507+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages

  1. [1]

    Optimal learners for realizable regression: Pac learning and online learning, 2024 a

    Attias, I., Hanneke, S., Kalavasis, A., Karbasi, A., and Velegkas, G. Optimal learners for realizable regression: Pac learning and online learning, 2024 a . URL https://arxiv.org/abs/2307.03848

  2. [2]

    Universal Rates for Regression : Separations between Cut - Off and Absolute Loss

    Attias, I., Hanneke, S., Kalavasis, A., Karbasi, A., and Velegkas, G. Universal Rates for Regression : Separations between Cut - Off and Absolute Loss . In Proceedings of Thirty Seventh Conference on Learning Theory , pp.\ 359--405. PMLR, June 2024 b . URL https://proceedings.mlr.press/v247/attias24a.html. ISSN: 2640-3498

  3. [3]

    A theory of universal learning, 2020

    Bousquet, O., Hanneke, S., Moran, S., van Handel, R., and Yehudayoff, A. A theory of universal learning, 2020. URL https://arxiv.org/abs/2011.04483

  4. [4]

    A Characterization of Multiclass Learnability , March 2022

    Brukhim, N., Carmon, D., Dinur, I., Moran, S., and Yehudayoff, A. A Characterization of Multiclass Learnability , March 2022. URL http://arxiv.org/abs/2203.01550. arXiv:2203.01550 [cs]

  5. [6]

    and Krauthgamer, R

    Gottlieb, L.-A. and Krauthgamer, R. Proximity algorithms for nearly doubling spaces. SIAM J. Discrete Math., 27 0 (4): 0 1759--1769, 2013

  6. [7]

    Universal bayes consistency in metric spaces, 2021

    Hanneke, S., Kontorovich, A., Sabato, S., and Weiss, R. Universal bayes consistency in metric spaces, 2021. URL https://arxiv.org/abs/1906.09855

  7. [8]

    Universal Rates for Multiclass Learning , July 2023

    Hanneke, S., Moran, S., and Zhang, Q. Universal Rates for Multiclass Learning , July 2023. URL http://arxiv.org/abs/2307.02066. arXiv:2307.02066 [cs]

  8. [9]

    Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm

    Littlestone, N. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2 0 (4): 0 285--318, 1988. doi:10.1007/BF00116827

  9. [10]

    and Kontorovich, A

    Tsir Cohen , D. and Kontorovich, A. Learning with metric losses. In Loh, P.-L. and Raginsky, M. (eds.), Proceedings of Thirty Fifth Conference on Learning Theory, volume 178 of Proceedings of Machine Learning Research, pp.\ 662--700. PMLR, 02--05 Jul 2022. URL https://proceedings.mlr.press/v178/cohen22a.html

  10. [11]

    2020 , eprint=

    A Theory of Universal Learning , author=. 2020 , eprint=

  11. [12]

    Universal Rates for Multiclass Learning , July 2023

    Hanneke, Steve and Moran, Shay and Zhang, Qian , month = jul, year =. Universal. doi:10.48550/arXiv.2307.02066 , abstract =

  12. [13]

    Universal

    Attias, Idan and Hanneke, Steve and Kalavasis, Alkis and Karbasi, Amin and Velegkas, Grigoris , month = jun, year =. Universal. Proceedings of

  13. [14]

    Cohen, Alon and Erez, Liad and Hanneke, Steve and Koren, Tomer and Mansour, Yishay and Moran, Shay and Zhang, Qian , month = nov, year =. Sample. doi:10.48550/arXiv.2511.12659 , abstract =

  14. [15]

    Brukhim, Nataly and Carmon, Daniel and Dinur, Irit and Moran, Shay and Yehudayoff, Amir , month = mar, year =. A. doi:10.48550/arXiv.2203.01550 , abstract =

  15. [16]

    Proceedings of Thirty Fifth Conference on Learning Theory , pages =

    Learning with metric losses , author =. Proceedings of Thirty Fifth Conference on Learning Theory , pages =. 2022 , editor =

  16. [17]

    2021 , eprint=

    Universal Bayes consistency in metric spaces , author=. 2021 , eprint=

  17. [18]

    2024 , eprint=

    Optimal Learners for Realizable Regression: PAC Learning and Online Learning , author=. 2024 , eprint=

  18. [19]

    Lee-Ad Gottlieb and Robert Krauthgamer , title =. SIAM J. Discrete Math. , volume =. 2013 , pages =

  19. [20]

    Gottlieb, A

    Lee. Efficient Classification for Metric Data (extended abstract. 2014 , volume =. doi:10.1109/TIT.2014.2339840 , timestamp =

  20. [21]

    Machine Learning , volume =

    Learning Quickly When Irrelevant Attributes Abound: A New Linear-threshold Algorithm , author =. Machine Learning , volume =. 1988 , publisher =