Online Quantile Regression for Nonparametric Additive Models

Haoran Zhan

arxiv: 2604.08969 · v1 · submitted 2026-04-10 · 📊 stat.ML · cs.LG· math.ST· stat.TH

Online Quantile Regression for Nonparametric Additive Models

Haoran Zhan This is my paper

Pith reviewed 2026-05-10 17:48 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.STstat.TH

keywords quantile regressiononline learningnonparametric additive modelsfunctional gradient descentminimax optimal ratespinball lossstreaming data

0 comments

The pith

Projected functional gradient descent enables online additive quantile regression at the minimax optimal rate.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This work presents P-FGD, an online algorithm for nonparametric additive quantile regression that extends functional stochastic gradient descent to the pinball loss. The algorithm processes data sequentially without storing historical observations and keeps per-step computation at O(J_t log J_t) while prediction costs only O(J_t). A novel Hilbert space projection identity is used to show that the estimator attains the minimax rate O(t^{-2s/(2s+1)}), which is the best possible for functions of smoothness s. This matters because quantile regression captures uncertainty better than mean regression in many applications and online methods are essential for large or streaming datasets where batch methods fail due to memory or speed.

Core claim

The paper establishes that the proposed P-FGD estimator for the quantile function in nonparametric additive models achieves the minimax optimal consistency rate of order t to the power of minus two s over two s plus one, by means of a new projection identity in Hilbert space, while operating in an online fashion with low computational overhead and without data retention.

What carries the argument

The P-FGD algorithm, which applies projected functional gradient steps to the pinball loss on an additive basis expansion, with the Hilbert space projection identity providing the bridge to optimal rates.

If this is right

Additive quantile models can be learned efficiently from data streams.
The statistical performance matches that of the best offline estimators.
Mini-batch variants inherit the same convergence properties.
Prediction at any time remains computationally cheap.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The projection technique may generalize to other loss functions or model classes in online nonparametric learning.
Basis function selection might be adapted dynamically without losing the rate guarantees.
It opens the door to combining this with other online optimization techniques for even broader applicability.

Load-bearing premise

The target quantile function has a known smoothness parameter s and belongs to a space where the Hilbert space projection identity applies to the chosen additive basis functions.

What would settle it

A simulation study with quantile functions of known smoothness s, measuring the empirical convergence rate of the estimator as more data arrives, would directly test whether the claimed rate holds.

read the original abstract

This paper introduces a projected functional gradient descent algorithm (P-FGD) for training nonparametric additive quantile regression models in online settings. This algorithm extends the functional stochastic gradient descent framework to the pinball loss. An advantage of P-FGD is that it does not need to store historical data while maintaining $O(J_t\ln J_t)$ computational complexity per step where $J_t$ denotes the number of basis functions. Besides, we only need $O(J_t)$ computational time for quantile function prediction at time $t$. These properties show that P-FGD is much better than the commonly used RKHS in online learning. By leveraging a novel Hilbert space projection identity, we also prove that the proposed online quantile function estimator (P-FGD) achieves the minimax optimal consistency rate $O(t^{-\frac{2s}{2s+1}})$ where $t$ is the current time and $s$ denotes the smoothness degree of the quantile function. Extensions to mini-batch learning are also established.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a practical online algorithm for additive quantile regression that skips data storage and claims the optimal rate via a new Hilbert-space projection identity, but the rate claim needs the identity verified in detail.

read the letter

The key points are that this work gives a practical online algorithm for nonparametric additive quantile regression that avoids storing data and runs efficiently, plus a claim that it hits the optimal rate through a new projection identity in Hilbert space. The algorithm extends functional stochastic gradient descent to the pinball loss for additive models. It uses a projection to stay within the span of the basis functions. This leads to the stated per-step complexity and prediction time, which are real advantages over storing everything or using full RKHS methods. The new element is the P-FGD procedure itself and the projection identity that lets them transfer the batch minimax rate to the online setting. If the identity works as described, that is a clean way to get the result without extra logarithmic factors or worse. The soft spot is in the rate proof. It relies on the identity applying exactly to the additive basis under the pinball loss, with J_t growing appropriately. The abstract does not include the derivation or the precise conditions, so it is not possible to confirm yet whether residual terms appear or if extra assumptions on the basis are needed beyond smoothness s. The known-smoothness assumption is standard but does restrict how plug-and-play the method is. This paper is for specialists in online nonparametric statistics and streaming machine learning. A reader focused on quantile regression or additive models in data streams would get a usable algorithm and a rate to benchmark. It deserves peer review because the algorithmic design is concrete and the rate target is relevant to the subfield; referees can verify the identity and any gaps in the assumptions.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces a projected functional gradient descent (P-FGD) algorithm for online nonparametric additive quantile regression with the pinball loss. The method avoids storing historical data, with per-step complexity O(J_t ln J_t) and prediction time O(J_t), where J_t is the number of basis functions. The central claim is that a novel Hilbert space projection identity allows the online estimator to achieve the minimax optimal rate O(t^{-2s/(2s+1)}) for a quantile function of smoothness s; extensions to mini-batch learning are also presented.

Significance. If the novel projection identity applies without residual terms that degrade the rate, the work would be significant for providing a storage-efficient online algorithm for additive quantile models that matches the optimal statistical rate while outperforming RKHS approaches in computation. The explicit complexity bounds and mini-batch extension are practical strengths. The result would be of interest in streaming nonparametric regression if the identity is rigorously established under the paper's assumptions.

major comments (2)

[Proof of the optimal rate (main theorem)] The proof that P-FGD attains the minimax rate O(t^{-2s/(2s+1)}) (abstract and main theorem) rests on the novel Hilbert space projection identity equating the online projected gradient step under pinball loss to an equivalent batch projection onto the additive span. The manuscript does not list or verify the precise conditions (e.g., basis orthogonality, boundedness properties beyond smoothness s, or control of approximation error as J_t grows) under which this identity holds without introducing rate-degrading residuals; this is load-bearing for the central consistency claim.
[Assumptions and model setup] The assumption that the quantile function lies in a space where the Hilbert space projection identity applies directly to the chosen additive basis functions (with J_t increasing) is stated but not shown to be satisfied by standard Sobolev or Hölder balls of smoothness s alone. If extra conditions are required, the reduction from online to batch rate fails and the O(t^{-2s/(2s+1)}) claim does not follow from the standard minimax argument.

minor comments (2)

[Abstract] The abstract and introduction could more explicitly state the form of the additive model and the growth schedule for J_t to make the complexity claims immediately verifiable.
[Preliminaries] Notation for the pinball loss and the projection operator should be introduced with a short equation reference in the main text for readers unfamiliar with functional gradient descent.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review. The comments focus on the rigor of the proof for the optimal rate and the verification of assumptions. We address each point below and will revise the manuscript accordingly to make the conditions explicit.

read point-by-point responses

Referee: [Proof of the optimal rate (main theorem)] The proof that P-FGD attains the minimax rate O(t^{-2s/(2s+1)}) (abstract and main theorem) rests on the novel Hilbert space projection identity equating the online projected gradient step under pinball loss to an equivalent batch projection onto the additive span. The manuscript does not list or verify the precise conditions (e.g., basis orthogonality, boundedness properties beyond smoothness s, or control of approximation error as J_t grows) under which this identity holds without introducing rate-degrading residuals; this is load-bearing for the central consistency claim.

Authors: We appreciate the referee identifying the need for greater explicitness. The projection identity (Lemma 3.1) is derived assuming orthonormal additive basis functions in the relevant Hilbert space, bounded subgradients of the pinball loss, and J_t chosen to grow slower than the rate that would dominate the estimation error (specifically J_t = o(t^{1/(2s+1)})). The proof in Appendix A controls the residual terms from the online-to-batch equivalence and shows they are o(t^{-2s/(2s+1)}). We will revise by adding an explicit list of these conditions immediately before the main theorem statement, together with a short verification that they hold under the paper's smoothness and boundedness assumptions on the data. revision: yes
Referee: [Assumptions and model setup] The assumption that the quantile function lies in a space where the Hilbert space projection identity applies directly to the chosen additive basis functions (with J_t increasing) is stated but not shown to be satisfied by standard Sobolev or Hölder balls of smoothness s alone. If extra conditions are required, the reduction from online to batch rate fails and the O(t^{-2s/(2s+1)}) claim does not follow from the standard minimax argument.

Authors: The quantile function is assumed to lie in an additive Sobolev space of order s (Section 2), which is the standard setting for minimax results on additive models. The projection identity applies directly because the additive structure permits an orthogonal decomposition of the Hilbert space into univariate components, and the chosen bases (e.g., orthogonal polynomials or splines) are orthonormal by construction. We will insert a new proposition (Proposition 2.1) that confirms standard Sobolev and Hölder balls of smoothness s satisfy the required approximation and boundedness properties for the selected bases, so that no additional assumptions beyond those already stated are needed for the rate to hold. revision: yes

Circularity Check

0 steps flagged

No significant circularity; rate follows from novel identity plus standard minimax arguments

full rationale

The derivation introduces P-FGD for online additive quantile regression under pinball loss, states computational advantages, and invokes a novel Hilbert space projection identity to obtain the rate O(t^{-2s/(2s+1)}). No quoted step reduces the claimed rate to a fitted quantity, self-defined object, or load-bearing self-citation by construction. The identity is presented as a paper-internal lemma whose applicability to the chosen additive basis is part of the stated assumptions; the central consistency result therefore retains independent content beyond its inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

Based solely on the abstract, the central claim rests on standard nonparametric smoothness assumptions and a novel projection identity whose validity is not independently verified here.

free parameters (1)

J_t (number of basis functions)
Grows with time t and must be chosen to achieve the stated rate; no explicit selection rule given in abstract.

axioms (2)

domain assumption Quantile function has smoothness degree s in an appropriate function space
Required to obtain the specific rate O(t^{-2s/(2s+1)})
ad hoc to paper Hilbert space projection identity holds for the additive basis
Described as novel and central to the proof

pith-pipeline@v0.9.0 · 5468 in / 1422 out tokens · 52718 ms · 2026-05-10T17:48:18.289931+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

6 extracted references · 6 canonical work pages

[1]

Breiman, L. (2001). Random forests.Machine learning, 45(1):5–32. Gy¨ orfi, L., Kohler, M., Krzy˙ zak, A., and Walk, H. (2002).A Distribution-Free Theory of Nonpara- metric Regression. Springer New York

work page 2001
[2]

Hastie, T. J. and Tibshirani, R. J. (1990).Generalized Additive Models. Chapman and Hall/CRC. 13

work page 1990
[3]

and Bassett Jr, G

Koenker, R. and Bassett Jr, G. (1978). Regression quantiles.Econometrica, 46(1):33–50

work page 1978
[4]

Shen, Y., Xia, D., and Zhou, W.-X. (2025). Online quantile regression.Journal of Machine Learning Research, 26(231):1–55

work page 2025
[5]

Tsybakov, A. B. (2009).Introduction to Nonparametric Estimation. Springer Series in Statistics

work page 2009
[6]

and Simon, N

Zhang, T. and Simon, N. (2022). A sieve stochastic gradient descent estimator for online nonpara- metric regression in sobolev ellipsoids.The Annals of Statistics, 50(5):2848–2871. 14

work page 2022

[1] [1]

Breiman, L. (2001). Random forests.Machine learning, 45(1):5–32. Gy¨ orfi, L., Kohler, M., Krzy˙ zak, A., and Walk, H. (2002).A Distribution-Free Theory of Nonpara- metric Regression. Springer New York

work page 2001

[2] [2]

Hastie, T. J. and Tibshirani, R. J. (1990).Generalized Additive Models. Chapman and Hall/CRC. 13

work page 1990

[3] [3]

and Bassett Jr, G

Koenker, R. and Bassett Jr, G. (1978). Regression quantiles.Econometrica, 46(1):33–50

work page 1978

[4] [4]

Shen, Y., Xia, D., and Zhou, W.-X. (2025). Online quantile regression.Journal of Machine Learning Research, 26(231):1–55

work page 2025

[5] [5]

Tsybakov, A. B. (2009).Introduction to Nonparametric Estimation. Springer Series in Statistics

work page 2009

[6] [6]

and Simon, N

Zhang, T. and Simon, N. (2022). A sieve stochastic gradient descent estimator for online nonpara- metric regression in sobolev ellipsoids.The Annals of Statistics, 50(5):2848–2871. 14

work page 2022