arxiv: 2605.05432 · v1 · submitted 2026-05-06 · 🧮 math.ST · cs.LG· stat.ML· stat.TH

Recognition: unknown

Direct Estimation of Schr\"odinger Bridge Time-Series Drifts: Finite-Sample, Asymptotic, and Adaptive Guarantees

Othmane Mazhar , Huy\^en Pham

Authors on Pith no claims yet

Pith reviewed 2026-05-08 15:19 UTC · model grok-4.3

classification 🧮 math.ST cs.LGstat.MLstat.TH

keywords Schrödinger bridgedrift estimationnonparametric statisticskernel estimationadaptive methodsminimax ratestime series

0 comments

The pith

A direct kernel plug-in estimator for Schrödinger bridge time-series drifts achieves uniform non-asymptotic bounds and minimax-optimal adaptive rates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a nonparametric estimator for the drifts in Schrödinger bridge time-series models by plugging kernel estimates directly into the conditional-ratio form of the drift. It proves that this estimator has a uniform bound on its error for appropriate bandwidth choices, a central limit theorem for pointwise behavior, and an adaptive way to choose the bandwidth that nearly achieves the best possible rate. These results separate the statistical error from other sources of error like optimization. If the claims hold, this gives a practical way to estimate such drifts from data with known accuracy controls, useful for modeling processes that match given start and end distributions. The approach avoids iterative methods used in other analyses.

Core claim

Starting from the conditional-ratio form of the Schrödinger bridge time-series drift, a direct Nadaraya-Watson plug-in estimator admits a uniform non-asymptotic bound under Hölder regularity with a marginal-density floor and bounded support, a pointwise central limit theorem under undersmoothing, and an adaptive bandwidth selector that satisfies an oracle inequality and attains the minimax rate up to logarithmic factors, as confirmed by a matching lower bound.

What carries the argument

The direct Nadaraya-Watson kernel plug-in estimator applied to the conditional-ratio expression for the SBTS drift, which isolates the statistical estimation error.

If this is right

Uniform non-asymptotic error bounds are available for admissible bandwidth pairs.
A pointwise central limit theorem holds under genuine undersmoothing.
An adaptive bandwidth selector satisfies an oracle inequality.
The selector is minimax-rate optimal up to logarithmic factors.
A global minimax lower bound follows from a pivot-local one under compatibility conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The direct drift estimation could be applied to other problems involving estimation of drifts in diffusion models with endpoint constraints.
Relaxing the bounded support via suitable transformations might extend the method to processes on unbounded domains.
The oracle inequality for adaptivity could inspire similar selectors in related kernel estimation tasks for stochastic processes.

Load-bearing premise

The marginal density is bounded away from zero and the support is bounded, preventing instability in the conditional-ratio estimator's denominator.

What would settle it

Check if the uniform bounds or adaptive rates hold in simulations where the marginal density gets arbitrarily close to zero at some points or the support is unbounded.

Figures

Figures reproduced from arXiv: 2605.05432 by Huy\^en Pham, Othmane Mazhar.

**Figure 1.** Figure 1: Log–log sup-grid error curves for the GG and MM testbeds. The oracle slopes are compared view at source ↗

**Figure 2.** Figure 2: Pointwise Gaussian approximation, adaptivity, and terminal-edge behavior. Left: QQ-plots view at source ↗

**Figure 3.** Figure 3: Secondary rate plots based on the integrated squared error. view at source ↗

read the original abstract

We study nonparametric estimation of Schr\"odinger bridge (SB) drifts from i.i.d.\ data observed on a single time interval. Starting from the conditional-ratio form of the Schr\"odinger bridge time-series (SBTS) drift formula, we analyze a direct Nadaraya--Watson plug-in estimator built from kernelized numerator and denominator terms. Unlike recent SB analyses based on entropic-OT potentials, Sinkhorn iterations, or iterative bridge solvers, our approach works directly at the drift level and isolates \emph{statistical error} from optimization, approximation, and discretization error. Under H\"older regularity, a marginal-density floor, and bounded support, we prove a uniform non-asymptotic bound for admissible bandwidth pairs, a pointwise CLT under genuine undersmoothing, and an adaptive bandwidth selector satisfying an oracle inequality. We also prove a pivot-local minimax lower bound which, through an explicit uniform pivot, yields a global minimax lower bound under transparent compatibility conditions; hence the adaptive selector is minimax-rate optimal up to logarithmic factors. Synthetic experiments provide theorem-targeted diagnostics for finite-sample scaling, Gaussian approximation, and adaptive behavior.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a direct Nadaraya-Watson plug-in for Schrödinger bridge time-series drifts with uniform finite-sample bounds, undersmoothing CLT, and adaptive oracle inequality, all under a density floor and compact support.

read the letter

The core advance is a straightforward kernel estimator built from the conditional-ratio form of the SB drift. It skips entropic potentials and Sinkhorn loops, so statistical error is isolated from optimization and discretization. The paper supplies a uniform non-asymptotic bound over admissible bandwidth pairs, a pointwise CLT under genuine undersmoothing, an adaptive selector with oracle inequality, and a pivot-based minimax lower bound that shows the selector is rate-optimal up to logs. Synthetic checks target exactly those scaling and adaptation claims, which is helpful. The arguments follow standard nonparametric kernel techniques applied to this setting, with no obvious circularity in the derivations. The citation pattern is appropriate for the SB and kernel literature. The main limitation is the standing assumptions: a strict positive lower bound on the marginal density plus bounded support. These keep the ratio denominator uniformly controlled; remove them and the uniform bound, the oracle inequality, and the global minimax claim no longer hold. That is a genuine restriction rather than a minor technicality, and it will matter for many time-series datasets where densities can approach zero. The abstract is clear that the results are proved under these conditions, so the paper does not overclaim. This work is for statisticians and ML researchers who need explicit drift estimators inside Schrödinger bridge or entropic OT pipelines. It deserves a serious referee because the formal statements are precise, the direct approach is clean, and the adaptive result is useful even if the assumptions need discussion in review.

Referee Report

2 major / 3 minor

Summary. The paper develops a direct Nadaraya-Watson kernel plug-in estimator for the drift function of Schrödinger bridge time-series processes, obtained from the conditional-ratio representation of the SB drift. Under Hölder regularity of the drift and densities, a strict positive lower bound on the marginal density, and compact support, the authors establish a uniform non-asymptotic error bound for admissible bandwidth pairs, a pointwise central limit theorem under undersmoothing, an adaptive bandwidth selector obeying an oracle inequality, and a global minimax lower bound (via an explicit pivot-local construction) showing that the adaptive estimator attains the minimax rate up to logarithmic factors. Synthetic experiments are used to illustrate finite-sample scaling, Gaussian approximation quality, and adaptive behavior.

Significance. If the stated results hold, the work supplies a statistically rigorous, non-iterative estimation procedure for SB drifts that cleanly isolates statistical error from optimization and discretization artifacts present in entropic OT or Sinkhorn-based alternatives. The combination of explicit finite-sample uniform bounds, a pointwise CLT, an oracle inequality for the adaptive selector, and a matching minimax lower bound constitutes a strong contribution to nonparametric estimation for stochastic processes. The explicit pivot construction used to obtain the global lower bound is a methodological strength that could be useful in related drift-estimation settings.

major comments (2)

[§3, Assumption A2, Theorem 1] §3 (Main results), Assumption A2 (marginal-density floor) and Theorem 1 (uniform non-asymptotic bound): the uniform control of the denominator in the Nadaraya-Watson ratio estimator is obtained by invoking inf_x p(x) ≥ c > 0 together with compact support; the proof sketch indicates that the variance term in the decomposition ceases to be uniformly integrable once this floor is removed, so the stated bound and the subsequent oracle inequality for the adaptive selector are not expected to hold globally without it. The manuscript does not provide a local or truncated version of the result that would remain valid when the floor is violated on a set of small measure.
[§4.2, Theorem 3] §4.2 (Adaptive bandwidth selector), Theorem 3 (oracle inequality): the proof of the oracle inequality relies on the same uniform lower bound on the marginal density to control the supremum of the empirical process over the bandwidth grid; without an explicit statement of how the constant in the oracle inequality scales with the density floor c, it is difficult to assess the practical range of applicability when c is small but positive.

minor comments (3)

[§3.3] The introduction and abstract refer to 'genuine undersmoothing' for the CLT, but the precise relation between the bandwidth pair (h_n, g_n) and the sample size n that guarantees the bias term is o_p of the stochastic term is stated only in the technical appendix; a short inline remark in §3.3 would improve readability.
[Figure 2] Figure 2 (adaptive bandwidth diagnostics) plots the selected bandwidth against the oracle bandwidth but does not report the constant factor in the logarithmic term of the oracle inequality; adding this comparison would make the empirical verification of the theoretical rate more direct.
[§2] Notation: the symbols for the kernel bandwidths (h for the numerator, g for the denominator) are introduced in §2 but occasionally interchanged in the proof of the uniform bound; a single consistent definition table would eliminate ambiguity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and for identifying the central role of Assumption A2 in our uniform results. We address each major comment below and have incorporated clarifications into the revised manuscript.

read point-by-point responses

Referee: [§3, Assumption A2, Theorem 1] §3 (Main results), Assumption A2 (marginal-density floor) and Theorem 1 (uniform non-asymptotic bound): the uniform control of the denominator in the Nadaraya-Watson ratio estimator is obtained by invoking inf_x p(x) ≥ c > 0 together with compact support; the proof sketch indicates that the variance term in the decomposition ceases to be uniformly integrable once this floor is removed, so the stated bound and the subsequent oracle inequality for the adaptive selector are not expected to hold globally without it. The manuscript does not provide a local or truncated version of the result that would remain valid when the floor is violated on a set of small measure.

Authors: We agree that the uniform lower bound on the marginal density is essential for the uniform non-asymptotic bound in Theorem 1. The proof controls the denominator of the Nadaraya-Watson estimator uniformly away from zero; without inf p(x) ≥ c > 0 the stochastic term fails to be uniformly integrable over the entire support, as noted. This assumption is standard for obtaining sup-norm guarantees in nonparametric estimation and is stated explicitly. In the revision we have added a remark in Section 3.1 clarifying its necessity and noting that the results remain valid locally on compact subsets where the density is bounded below by a positive constant. A fully global result without any density floor would require localization or truncation arguments, which we view as a natural direction for future work but outside the present scope. revision: yes
Referee: [§4.2, Theorem 3] §4.2 (Adaptive bandwidth selector), Theorem 3 (oracle inequality): the proof of the oracle inequality relies on the same uniform lower bound on the marginal density to control the supremum of the empirical process over the bandwidth grid; without an explicit statement of how the constant in the oracle inequality scales with the density floor c, it is difficult to assess the practical range of applicability when c is small but positive.

Authors: The oracle inequality in Theorem 3 is proved under the same assumptions, including the density floor c. The constant indeed depends on c through the variance bounds on the empirical process (scaling as O(1/c^2) in the leading term). We have revised the statement of Theorem 3 and the accompanying proof sketch to make this dependence explicit, thereby clarifying the practical range of applicability when c is small but positive. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation applies standard nonparametric theory to a plug-in estimator defined from the given conditional-ratio formula.

full rationale

The paper begins with the conditional-ratio form of the SBTS drift, constructs a direct Nadaraya-Watson kernel plug-in estimator, and derives uniform non-asymptotic bounds, pointwise CLT, and adaptive oracle inequality under Hölder regularity plus a marginal-density floor and bounded support. These steps rely on classical bias-variance decompositions and concentration inequalities for kernel estimators; they do not reduce any claimed result to a fitted parameter, self-referential definition, or self-citation chain. The assumptions are explicit regularity conditions required for uniform control of the denominator, not hidden tautologies. The minimax lower bound is obtained via an explicit pivot construction that is independent of the upper-bound proofs. No load-bearing self-citations appear in the derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 0 invented entities

The central claims rest on three standard nonparametric assumptions plus the existence of the conditional-ratio representation of the SB drift. No free parameters are introduced beyond the bandwidth (which is adapted). No new entities are postulated.

axioms (3)

domain assumption Hölder regularity of the drift and density functions
Invoked to obtain the rates in the uniform non-asymptotic bound and CLT.
domain assumption Marginal density possesses a positive lower bound (floor)
Required to control the denominator of the conditional-ratio estimator and obtain uniform bounds.
domain assumption Data have bounded support
Used to simplify kernel estimation and uniform convergence arguments.

pith-pipeline@v0.9.0 · 5519 in / 1515 out tokens · 39700 ms · 2026-05-08T15:19:36.877347+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 16 canonical work pages

[1]

URLhttps://doi.org/10.1145/ 3768292.3770371

A. Alouadi, B. Barreau, L. Carlier, and H. Pham. Robust time series generation via Schrödinger Bridge: a comprehensive evaluation. InProceedings of the 6th ACM International Conference on AI in Finance, ICAIF ’25, pages 906–914, 2025. URL https://doi.org/10.1145/ 3768292.3770391

work page arXiv 2025
[2]

Belomestny and J

D. Belomestny and J. Schoenmakers. Forward reverse kernel regression for the schrödinger bridge problem.arXiv preprint arXiv:2507.00640, 2025

work page arXiv 2025
[3]

Boucheron, G

S. Boucheron, G. Lugosi, and P. Massart.Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, 2013

2013
[4]

Bousquet

O. Bousquet. A bennett concentration inequality and its application to suprema of empirical pro- cesses.Comptes Rendus Mathematique, 334(6):495–500, 2002. doi: 10.1016/S1631-073X(02) 02292-6

work page doi:10.1016/s1631-073x(02 2002
[5]

Y . Chen, T. T. Georgiou, and M. Pavon. Stochastic control liaisons: Richard sinkhorn meets gaspard monge on a schrödinger bridge.SIAM Review, 63(2):249–313, 2021. doi: 10.1137/ 20M1339982

2021
[6]

M. Cuturi. Sinkhorn distances: Lightspeed computation of opti- mal transport. InAdvances in Neural Information Processing Sys- tems, volume 26, 2013. URL https://papers.nips.cc/paper/ 4927-sinkhorn-distances-lightspeed-computation-of-optimal-transport

2013
[7]

De Bortoli, J

V . De Bortoli, J. Thornton, J. Heng, and A. Doucet. Diffusion schrödinger bridge with appli- cations to score-based generative modeling. InAdvances in Neural Information Processing Systems, 2021

2021
[8]

Einmahl and D

U. Einmahl and D. M. Mason. Uniform in bandwidth consistency of kernel-type function estimators.The Annals of Statistics, 33(3):1380–1403, 2005

2005
[9]

H. Föllmer. Random fields and diffusion processes. InÉcole d’Été de Probabilités de Saint- Flour XV–XVII, 1985–87, volume 1362 ofLecture Notes in Mathematics, pages 101–203. Springer, 1988. doi: 10.1007/BFb0086180

work page doi:10.1007/bfb0086180 1985
[10]

Genevay, L

A. Genevay, L. Chizat, F. Bach, M. Cuturi, and G. Peyré. Sample complexity of sinkhorn diver- gences. InProceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, volume 89 ofProceedings of Machine Learning Research, pages 1574–1583. PMLR, 2019. URLhttps://proceedings.mlr.press/v89/genevay19a.html

2019
[11]

Giné and A

E. Giné and A. Guillou. Rates of strong uniform consistency for multivariate kernel density estimators.Annales de l’Institut Henri Poincaré (B) Probability and Statistics, 38(6):907–921, 2002

2002
[12]

Goldenshluger and O

A. Goldenshluger and O. Lepski. Bandwidth selection in kernel density estimation: Oracle inequalities and adaptive minimax optimality.The Annals of Statistics, 39(3):1608–1632, 2011. doi: 10.1214/11-AOS883. 10

work page doi:10.1214/11-aos883 2011
[13]

Györfi, M

L. Györfi, M. Kohler, A. Krzy˙zak, and H. Walk.A Distribution-Free Theory of Nonparametric Regression. Springer, 2002. doi: 10.1007/978-1-4613-0003-1

work page doi:10.1007/978-1-4613-0003-1 2002
[14]

Generative modeling for time series via Schr{\

M. Hamdouche, P. Henry-Labordère, and H. Pham. Generative modeling for time series via Schrödinger Bridge.Journal of Machine Learning Research, 2026. URL https://arxiv. org/abs/2304.05093. To appear

work page arXiv 2026
[15]

B. Jamison. The markov processes of schrödinger.Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, 32:323–331, 1975. doi: 10.1007/BF00535844

work page doi:10.1007/bf00535844 1975
[16]

Lacour and P

C. Lacour and P. Massart. Minimal penalty for goldenshluger–lepski method.Stochastic Processes and their Applications, 126(12):3774–3789, 2016. doi: 10.1016/j.spa.2016.04.015

work page doi:10.1016/j.spa.2016.04.015 2016
[17]

Le Cam.Asymptotic Methods in Statistical Decision Theory

L. Le Cam.Asymptotic Methods in Statistical Decision Theory. Springer Series in Statistics. Springer, New York, 1986

1986
[18]

C. Léonard. A survey of the schrödinger problem and some of its connections with optimal transport.Discrete and Continuous Dynamical Systems - A, 34(4):1533–1574, 2014. doi: 10.3934/dcds.2014.34.1533

work page doi:10.3934/dcds.2014.34.1533 2014
[19]

O. V . Lepski and V . G. Spokoiny. Optimal pointwise adaptive methods in nonparametric estimation.The Annals of Statistics, 25(6):2512–2546, 1997. doi: 10.1214/aos/1030741083

work page doi:10.1214/aos/1030741083 1997
[20]

G.-H. Liu, Y . Lipman, M. Nickel, B. Karrer, E. A. Theodorou, and R. T. Q. Chen. Generalized schrödinger bridge matching. InInternational Conference on Learning Representations, 2024

2024
[21]

Marie and A

N. Marie and A. Rosier. Nadaraya–watson estimator for i.i.d. paths of diffusion processes. Scandinavian Journal of Statistics, 50(2):589–637, 2023. doi: 10.1111/sjos.12593

work page doi:10.1111/sjos.12593 2023
[22]

Mena and J

G. Mena and J. Weed. Statistical bounds for entropic optimal transport: Sample com- plexity and the central limit theorem. InAdvances in Neural Information Processing Sys- tems, volume 32, 2019. URL https://proceedings.neurips.cc/paper/2019/hash/ 5acdc9ca5d99ae66afdfe1eea0e3b26b-Abstract.html

2019
[23]

E. A. Nadaraya. On estimating regression.Theory of Probability and Its Applications, 9(1): 141–142, 1964. doi: 10.1137/1109020

work page doi:10.1137/1109020 1964
[24]

Panloup, S

F. Panloup, S. Tindel, and M. Varvenne. A general drift estimation procedure for stochastic differential equations with additive fractional noise.Electronic Journal of Statistics, 14(1): 1075–1136, 2020. doi: 10.1214/20-EJS1685

work page doi:10.1214/20-ejs1685 2020
[25]

Pooladian and J

A. Pooladian and J. Niles-Weed. Plug-in estimation of schrödinger bridges.arXiv preprint arXiv:2408.11686, 2024

work page arXiv 2024
[26]

Schrödinger

E. Schrödinger. Über die umkehrung der naturgesetze.Sitzungsberichte der Preussischen Akademie der Wissenschaften, Physikalisch-mathematische Klasse, pages 144–153, 1931

1931
[27]

A. J. Stromme. Sampling from a schrödinger bridge. InProceedings of the 26th International Conference on Artificial Intelligence and Statistics, volume 206 ofProceedings of Machine Learning Research, pages 4058–4067. PMLR, 2023. URL https://proceedings.mlr. press/v206/stromme23a.html

2023
[28]

A. B. Tsybakov.Introduction to Nonparametric Estimation. Springer, 2009. doi: 10.1007/ b13794

2009
[29]

A. W. van der Vaart and J. A. Wellner.Weak Convergence and Empirical Processes: With Applications to Statistics. Springer Series in Statistics. Springer, New York, 1996. doi: 10.1007/ 978-1-4757-2545-2

1996
[30]

G. S. Watson. Smooth regression analysis.Sankhy ¯a: The Indian Journal of Statistics, Series A, 26(4):359–372, 1964

1964
[31]

B. Yu. Assouad, fano, and le cam. In D. Pollard, E. Torgersen, and G. L. Yang, editors,Festschrift for Lucien Le Cam, pages 423–435. Springer, 1997. doi: 10.1007/978-1-4612-1880-7_29. 11 A Proof roadmap and auxiliary notation Throughout the Appendices, we keep the notation of Sections 3–4. In particular, g1, g2, D∗, N ∗, Q∗, a∗,bfj,bgj,bD, bN ,baretain th...

work page doi:10.1007/978-1-4612-1880-7_29 1997
[32]

Write F(t, ξ, x, y) =χ(t, x, y)G(ξ, y) , where χ(t, x, y) :=e − |y−x|2 2∆(t) and G(ξ, y) :=e |y−ξ|2 2∆

Moreover, if P denotes the law of(X s, Xu), thenσ h1 := suph∈Hh1 |h|L2(P) ≤C F p Cf |K|2/hd/2 1 , because |ht,x,ξ|2 L2(P) = ZZ F(t, ξ, x, y) 2 Kh1(z−ξ) 2 p(z, y)dz dy ≤C 2 F Z Kh1(z−ξ) 2 f(z)dz≤C 2 F Cf |Kh1 |2 L2(λ). Write F(t, ξ, x, y) =χ(t, x, y)G(ξ, y) , where χ(t, x, y) :=e − |y−x|2 2∆(t) and G(ξ, y) :=e |y−ξ|2 2∆ . On [s, u−η]×R d ×B , the function ...