math.ST — Pith

0

math.ST 2026-05-13 Recognition

Sampler matches smooth-case rate for composite log-concave densities

A proximal gradient algorithm for composite log-concave sampling

The proximal gradient method uses a restricted Gaussian oracle and reaches epsilon total variation error in O(kappa sqrt(d) log^4(1/eps)) it

abstract click to expand

We propose an algorithm to sample from composite log-concave distributions over $\mathbb{R}^d$, i.e., densities of the form $\pi\propto e^{-f-g}$, assuming access to gradient evaluations of $f$ and a restricted Gaussian oracle (RGO) for $g$. The latter requirement means that we can easily sample from the density $\text{RGO}_{g,h,y}(x) \propto \exp(-g(x) -\frac{1}{2h}||y-x||^2)$, which is the sampling analogue of the proximal operator for $g$. If $f + g$ is $\alpha$-strongly convex and $f$ is $\beta$-smooth, our sampler achieves $\varepsilon$ error in total variation distance in $\widetilde{\mathcal O}(\kappa \sqrt d \log^4(1/\varepsilon))$ iterations where $\kappa := \beta/\alpha$, which matches prior state-of-the-art results for the case $g=0$. We further extend our results to cases where (1) $\pi$ is non-log-concave but satisfies a Poincar\'e or log-Sobolev inequality, and (2) $f$ is non-smooth but Lipschitz.

0

math.ST 2026-05-13 Recognition

Pattern tests for independence get explicit null limits

Efficiency of pattern-based independence test

Complete limiting distributions and asymptotic efficiencies provided for length-four pattern sets

abstract click to expand

Tests of independence are an important tool in applications, specifically in connection with the detection of a relationship between variables; they also have initiated many developments in statistical theory. In the present paper we build upon and extend a recently established link to Discrete Mathematics and Theoretical Computer Science, exemplified by the appearance of copulas in connection with limits of permutation sequences, and by the connection between quasi-randomness and consistency of pattern-based tests of independence. The latter include classical procedures, such as Kendall's tau, which uses patterns of length two. Longer patterns lead to tests that are consistent against large classes of alternatives, as first shown by Hoeffding (1948) with patterns of length five, and by Yanagimoto (1970) and Bergsma and Dassios (2014) for patterns of length four. More recently Chan et al.\ (2020) characterized quasi-randomness for sets of patterns of length four, which leads to several new consistent pattern-based test for independence. We give a detailed and complete description of the respective limiting null distributions. In connection with the power performance of the tests, which is of interest for practical purposes, we provide results on their (local) asymptotic relative efficiencies. We also include a small simulation study that supports our theoretical findings.

0

math.ST 2026-05-13 Recognition

Weaker likelihood ratio shapes still give stochastic orders

Stochastic Ordering under Weaker Likelihood-Ratio Shape Conditions

Unimodality or two sign changes in the ratio minus one preserve endpoint criteria for hazard-rate and usual orders.

abstract click to expand

We show that the shape hypothesis on a likelihood ratio can be weakened while retaining endpoint criteria for the hazard-rate and usual stochastic orders. The endpoint reduction persists under unimodality of the likelihood ratio and under a sign-pattern condition on the likelihood ratio minus one, with at most two sign changes and a negative right tail. It also follows from a direct superlevel-set criterion involving the same expression, which is useful in particular for discontinuous likelihood ratios.

0

math.ST 2026-05-13 Recognition

Augmented KRR separates linear and nonlinear parts

Adaptive Kernel Ridge Regression with Linear Structure: Sharp Oracle Inequalities and Minimax Optimality

The estimator achieves sharp oracle bounds and minimax rates by adding an explicit linear term with no extra tuning or cost.

abstract click to expand

Kernel ridge regression (KRR) is a widely used nonparametric method due to its strong theoretical guarantees and computational convenience. However, standard KRR does not distinguish between linear and nonlinear components in the signal, instead applying a single functional regularization to the entire function. This may lead to unnecessary shrinkage of linear structure and consequently suboptimal prediction performance. In this paper, we propose a modified regression procedure that augments KRR with an explicit linear component. The proposed method has the same computational complexity as standard KRR and introduces no additional tuning parameters. Theoretically, we establish a sharp oracle inequality for the proposed estimator and show that it adaptively captures both linear and nonlinear structure, achieving minimax optimal prediction risk under general kernels. Compared with standard KRR, the proposed method improves both the bias and approximation error at the expense of only an additional parametric variance term, which is negligible in low- and moderate-dimensional settings. In high-dimensional regimes, incorporating ridge regularization for the linear component yields a procedure that performs uniformly no worse than KRR. Extensive simulation studies support the theoretical findings.

0

math.ST 2026-05-13 Recognition

Bayesian bootstrap recovers Efron method as special case

Bayesian and Empirical Bayesian Bootstrapping

Dirichlet process simulation shows classic resampling approximates the posterior for smooth parameters as samples grow large.

abstract click to expand

Let $X_1,\ldots,X_n$ be a random sample from an unknown probability distribution $P$ on the sample space ${\cal X}$, and let $\theta=\theta(P)$ be a parameter of interest. The present paper proposes a nonparametric `Bayesian bootstrap' method of obtaining Bayes estimates and Bayesian confidence limits for $\theta$. It uses a simple simulation technique to numerically approximate the exact posterior distribution of $\theta$ using a (non-degenerate) Dirichlet process prior for $P$. Asymptotic arguments are given which justify the use of the Bayesian bootstrap for any smooth functional $\theta(P)$. When the prior is fixed and the sample size grows five approaches become first-order equivalent: the exact Bayesian, the Bayesian bootstrap, Rubin's degenerate-prior bootstrap, Efron's bootstrap, and the classical one using delta methods. The Bayesian bootstrap method is also extended to the semiparametric regression case. A separate section treats similar ideas for censored data and for more general hazard rate models, where a connection is made to a `weird bootstrap' proposed by Gill. Finally empirical Bayesian versions of the procedure are discussed, where suitable parameters of the Dirichlet process prior are inferred from data. Our results lend Bayesian support to the classic Efron bootstrap. It is the Bayesian bootstrap under a noninformative reference prior; it is a limit of natural approximations to good Bayes solutions; it is an approximation to a natural empirical Bayesian strategy; and the formally incorrect reading of a bootstrap histogram as a posterior distribution for the parameter isn't so incorrect after all.

0

math.ST 2026-05-13 3 theorems

Gaussian limits for spectral statistics survive fourth-moment corrections

The Geometry of Spectral Fluctuations: On Near-Optimal Conditions for Universal Gaussian CLTs, with Statistical Applications

Covariance decomposition isolates a universal Gaussian term plus explicit fourth-order adjustments for linear statistics of high-dimensional

abstract click to expand

We study linear spectral statistics of high dimensional sample covariance matrices in a regime where the empirical spectral distribution remains governed by the classical sample covariance law but the fluctuation theory is nonclassical. Our starting point is a decomposition of the covariance of centered quadratic forms into a universal Gaussian part and a model dependent fourth order correction. This leads to an abstract framework, termed GHOST, for universal Gaussian central limit theorems under structured fourth order effects. Under this framework, we prove a Gaussian central limit theorem for linear spectral statistics, with explicit mean and covariance corrections determined by a bilinear fourth order kernel. Boundary examples show that the conditions are close to necessary for a broad universal Gaussian closure. We then develop a blockwise mixed radial model that verifies the abstract assumptions and makes the correction explicit. The correction splits into an entrywise fourth moment component and a lockwise energy fluctuation component. The latter may change the fluctuation scale, leading to a phase transition at the level of fluctuations. As an application, we study sphericity testing. Under the spherical null, the general correction collapses to a single scalar parameter, yielding a feasible data driven correction of John's test.

0

math.ST 2026-05-12 2 theorems

Deterministic residual update removes stochastic variance in ensemble filters

A Data-Consistent Approach to Ensemble Filtering

QPCA-EnDCF whitens forecast-observation residuals and restricts corrections to stable low-rank subspaces, improving calibration on Lorenz-96

abstract click to expand

Ensemble filtering of chaotic, partially observed systems is often performed with ensembles far smaller than the state dimension resulting in empirical covariances that are low rank. Subsequently, stochastic observation perturbations can degrade both accuracy and probabilistic calibration. We develop a data-consistent perspective on ensemble filtering and introduce the Quantity-of-Interest Principal Component Analysis Ensemble Data Consistent Filter (QPCA-EnDCF), which is a deterministic method that replaces perturbed observations with a spectrally regularized update in observation space. The method whitens forecast--observation residuals, computes an empirical eigendecomposition of the residual covariance, and restricts the correction to a rank-$\kappa$ subspace before mapping the increment back to state space through an empirical gain. We establish a theoretical framework that separates population and finite-ensemble objects and yields a bias--variance decomposition for the analysis mean. The analysis shows that stochastic EnKF variants incur an irreducible $\mathcal{O}(1/N)$ variance contribution from observation perturbations, whereas QPCA-EnDCF replaces this term with projector-estimation variability that is also $\mathcal{O}(1/N)$ but depends on the retained rank and the cutoff gap through eigenspace stability. Numerical experiments on the Lorenz--96 system in strongly undersampled regimes demonstrate that QPCA-EnDCF substantially improves spread--skill behavior, temporal tracking between spread and error, and rank-histogram reliability relative to sequential and four-dimensional stochastic EnKF. Under the baseline configuration, these calibration gains are accompanied by lower RMSE.

0

math.ST 2026-05-12 Recognition

The paper develops a polynomial-time algorithm using semidefinite programming relaxation…

Efficient Robust Constrained Signal Detection via Kolmogorov Width Approximations

SDP and ellipsoid approximations enable efficient testing for contaminated structured signals without exact geometry computation.

abstract click to expand

Robust statistical inference often faces a severe computational-statistical gap when dealing with complex parameter spaces. We investigate minimax signal detection in the Gaussian sequence model under strong $\epsilon$-contamination, where the signal belongs to a general prior constraint $K$. Existing optimal tests require computing the exact Kolmogorov $k$-width of $K$, a computationally intractable task for general non-trivial sets. We bridge this gap by proposing a polynomial-time testing framework that universally applies to balanced, type-2, and exactly 2-convex constraints. By leveraging a semidefinite programming relaxation and a modified ellipsoid method equipped with an approximate subgradient oracle, we efficiently approximate the Kolmogorov widths. Remarkably, our unconditional efficient algorithm achieves a robust detection boundary that matches existing upper bounds up to a mere polylogarithmic factor. This establishes a computationally tractable testing solution for a broad class of structured signals without requiring prior knowledge of their exact geometric complexity.

0

math.ST 2026-05-12 2 theorems

Bahadur representation yields high quantile homogeneity test

A Generative High Quantile Homogeneity Test Using Bahadur Representation for Heteroskedastic High Quantile Regression of Tail Dependent Time Series

The result converts nonlinear tail quantile problems to linear ones with explicit error bounds for heteroskedastic dependent series

abstract click to expand

We consider a high quantile homogeneity test to determine whether a certain set of explanatory variables has homogeneous effects on different high quantiles of the response variable in the tail. To accommodate for situations under both the null and the alternative, the auxiliary process in this case may no longer be treated as stationary, and the problem requires a joint analysis of both homoscedastic and heteroskedastic high quantiles. For this, we develop a novel Bahadur representation result in the high quantile setting for a general class of tail dependent time series under potential heteroskedasticity, which can be of interest by its own. In particular, the Bahadur representation provides a foundation for reducing problems regarding nonlinear high quantile regression estimators to those regarding suitably constructed linear forms with an explicit error bound and can be transformative and useful in many statistical problems. We in the current article apply it to guide the development of a generative high quantile homogeneity test, which is then illustrated through applications to both synthetic and real data.

0

math.ST 2026-05-12 Recognition

Finite VC dimension enables finite-sample tests for distribution trade-offs

When Are Trade-Off Functions Testable from Finite Samples?

When optimal rejection regions lie in the class, a test controls errors nonasymptotically and yields simultaneous confidence bands for the全曲

abstract click to expand

We study finite-sample inference for the trade-off function of two unknown probability distributions, the function that traces the optimal type I/type II error frontier in binary testing. Given samples from distributions $P$ and $Q$, we consider the problem of testing whether their trade-off function lies above a benchmark curve $f_0$ or falls below a weaker benchmark $f_1$. Without structural restrictions, this problem is impossible uniformly over nonparametric classes. We identify a sharp condition under which it becomes possible. The key structural assumption is that the Neyman--Pearson rejection regions for $(P,Q)$ are attainable, up to null sets, by a prescribed class $S$ of measurable sets. Within this exact attainability framework, finite Vapnik--Chervonenkis dimension of $S$ is both sufficient and necessary for nontrivial finite-sample testing. We construct a test with nonasymptotic error guarantees: type I error control is valid without assuming attainability, while power holds uniformly over attainable alternatives satisfying an explicit separation condition. By inverting the test, we also obtain simultaneous confidence bands for the whole trade-off curve. Finally, we study the sharpness and robustness of the procedure. In the monotone likelihood-ratio model, we derive local separation rates and prove matching lower bounds up to logarithmic factors. We also allow approximate, rather than exact, attainability; this extension yields finite-sample guarantees for univariate log-concave distributions by approximating their rejection regions with unions of intervals.

0

math.ST 2026-05-12 2 theorems

New measure tracks tail dependence in heavy-tailed linear processes

Measuring Tail Dependence in Linear Processes: Theory and Empirics

It handles identical and non-identical regularly varying distributions, as tested on crypto data and simulations.

abstract click to expand

The quantitative analysis of financial time series often reveals two distinct features that standard Gaussian frameworks fail to capture: heavy-tailed marginal distributions and the phenomenon of extreme co-movements.While extreme value theory characterizes marginal behavior, Copulas provide a functional bridge to describe the dependence structure independently of the marginals. We are proposing a different way of looking at the joint extremes on the basis of a dependence measure. The proposed idea incorporates both the non-identical and identical regularly varying distributions. Informed by the analysis of some high-frequency cryptocurrency datasets, the effect of persistence property have been thoroughly studied under these setups. A detailed simulation study confirms our intuition and findings.

0

math.ST 2026-05-12 1 theorem

Gaussian priors hit minimax rates for point-process intensity

Increasing domain asymptotics for covariate-based nonparametric Bayesian intensity estimation with Gaussian and Besov-Laplace priors

Wide class of priors with link functions achieves optimal posterior contraction when covariates are ergodic over expanding domains.

abstract click to expand

We study the problem of estimating the intensity function of a covariate-driven point process based on observations of the points and covariates over a large window. We consider the nonparametric Bayesian approach, and show that a wide class of Gaussian priors, combined with flexible link functions, achieves minimax-optimal posterior contraction rates in the increasing domain asymptotics and under the assumption that the covariates be ergodic. We also employ Besov-Laplace priors, which are popular in imaging and inverse problems due to their edge-preserving and sparsity-promoting properties. We prove that these yield optimal estimation of spatially inhomogeneous intensities belonging to Besov spaces with low integrability index. These results are based on a general concentration theorem that extends recent findings from the literature. To corroborate the theory, we provide extensive numerical simulations, implementing the considered procedures via suitable posterior sampling schemes. Further, we present two real data analyses motivated by applications in forestry and the environmental sciences.

0

math.ST 2026-05-12 2 theorems

GAN method estimates full causal distributions with minimax optimality

Extended Wasserstein-GAN Approach to Causal Distribution Learning: Density-Free Estimation and Minimax Optimality

Minimizes averaged Wasserstein risk for conditional interventional outcomes without density ratios and proves optimality over Besov spaces.

abstract click to expand

Distributional causal inference requires estimating not only average treatment effects but also interventional outcome distributions, including quantiles, tail risks, and policy-dependent uncertainty. As a method for distributional causal inference, generative adversarial network (GAN)-based counterfactual methods are flexible tools for this task. However, these methods have several limitations. First, the objectives of certain techniques do not coincide with the statistical risk of the identifiable causal target, and therefore provide limited theoretical guarantees regarding estimable counterfactual distributions or optimality. Second, they tend to rely on unstable density-based methods, such as density ratio estimation. In this paper, we propose GANICE (GAN for Interventional Conditional Estimation) with several advantages: it (i) clarifies the conditional interventional distribution for each treatment--covariate state as the causal estimation target; (ii) estimates the conditional distribution such that its averaged Wasserstein risk is minimized; (iii) establishes minimax optimality. GANICE achieves these advantages through the introduction of the extended Wasserstein distance, the incorporation of a cellwise critic in its dual, and an optimality proof based on Besov space theory. Our experiments demonstrate that GANICE consistently outperforms existing methods.

0

math.ST 2026-05-11 Recognition

Tests for exposure mapping models cannot beat random rejection

On the Impossibility of Specification Testing of Interference Models Based on Exposure Mappings

Worst-case Type I and Type II errors sum to one for any test required to have power against larger mappings

abstract click to expand

In order to estimate causal effects in a randomized experiment where spillovers are suspected to occur, analysts must posit a model of interference. The most popular class of interference models are those based on exposure mappings. In practice, it is rarely clear which interference model accurately captures the true nature of spillovers in the experiment. In response, researchers have developed specification tests which seek to determine whether a given interference model is correctly specified. In this context, Type I error is the rejection rate when the interference model is actually correct and Type II error is the acceptance rate when the interference model is incorrectly specified. While existing tests have been explicitly constructed to control Type I error, their Type II error remains less well understood. In this paper, we provide a strong impossibility result: any specification test for an exposure mapping model which aims to have power against a larger exposure mapping model has worst-case Type I and Type II errors that sum to one. This means that no specification test can provide uniformly better performance than the naive test which discards all data and rejects the null at random. Our negative result holds for all sample sizes, for uniformly bounded outcomes, and for alternatives which are maximally separated from the null. Informative specification tests must therefore further restrict the alternative model against which they seek to attain power. To this end, we provide a uniformly consistent test for differentiating no-interference from a network-linear-in-means model.

0

math.ST 2026-05-11 Recognition

Regularization scheme extends to conditional density estimation

The general regularisation scheme applied to conditional density estimation

New estimator with proven rates uses Landweber iteration and competes with Nadaraya-Watson on time series data.

abstract click to expand

The general regularisation scheme, a versatile approach for nonparametric estimation, has been successfully applied to regression, density ratio, and score estimation. In this paper, we introduce a unified framework encompassing these settings and extend it to conditional density estimation, deriving a new estimator with rigorously established convergence rates. We implement the Landweber regularisation, which is computationally more tractable than Tikhonov regularisation in this context. Numerical experiments demonstrate that our estimator matches or outperforms the Nadaraya-Watson estimator in various scenarios, including time series models.

0

math.ST 2026-05-11 2 theorems

Exact signal thresholds derived for submatrix detection

Minimax optimal submatrix detection: Sharp non-asymptotic rates

Non-asymptotic upper and lower bounds on the critical strength match for every matrix size and submatrix dimension.

abstract click to expand

We consider the problem of detecting a hidden submatrix of size $s_1 \times s_2$ in a high-dimensional Gaussian matrix of size $d_1 \times d_2$. Under the null hypothesis, the observed matrix has i.i.d.\ entries with distribution $N(0,1)$. Under the alternative hypothesis, there exists an unknown submatrix of size $s_1 \times s_2$ with i.i.d.\ entries with distribution $N(\mu, 1)$ for some $\mu>0$, while all other entries outside the submatrix are i.i.d.\ $N(0,1)$. Specifically, we provide non-asymptotic upper and lower bounds on the smallest signal strength $\mu^*$ that is both necessary and sufficient to ensure the existence of a test with small enough Type I and Type II errors. We also derive novel minimax-optimal tests achieving these fundamental limits, and describe extensions of these tests that are adaptive to unknown sparsity levels $s_1$ and $s_2$. Our proposed detection procedure is a careful combination of novel test statistics which may be of independent interest. In contrast with previous work, which required restrictive assumptions on $d_1, d_2, s_1$ and $s_2$, our non-asymptotic upper and lower bounds match for any configuration of these parameters.

0

math.ST 2026-05-11 Recognition

Multi-source transfer carries adaptation cost past phase transition

The Statistical Cost of Adaptation in Multi-Source Transfer Learning

The intrinsic cost exceeds one for some bias configurations even with two sources and grows with more sources.

abstract click to expand

Multi-source transfer learning can improve target-domain estimation by leveraging related source data, but its benefits depend on unknown source-to-target biases. This raises a fundamental question: can a bias-agnostic estimator perform as well as an oracle that knows the true bias configuration? To study this, we introduce the intrinsic cost of adaptation, defined as the smallest worst-case ratio between the risk of any bias-agnostic estimator and the oracle risk. An intrinsic cost of one means oracle performance is achievable without knowing the biases, whereas a larger cost quantifies the unavoidable price of adaptation. Focusing on parametric estimation, we show that multi-source transfer behaves fundamentally differently from the single-source setting: adaptation is not always possible, even with only two sources. For a fixed number of sources, we characterize the intrinsic cost of adaptation and identify a phase transition separating regimes where oracle performance is achievable from those where it is not. As the number of sources grows, we further show that the adaptation cost increases. When adaptation over the full bias configuration space is impossible, additional structure can substantially reduce the cost. We study settings with ordered biases, clustered source parameters, and sufficiently separated non-informative sources, and propose estimators tailored to each regime, with supporting theoretical and empirical results. Overall, our results delineate the statistical limits of multi-source transfer, clarifying when oracle performance is attainable, when structural assumptions help, and when adaptation is fundamentally impossible.

0

math.ST 2026-05-11 Recognition

Smoothed Wasserstein costs converge at moment-based rates

Two-Sample Inference for Gaussian-Smoothed Wasserstein Costs with Finite Moments

Two-sample plug-in estimators achieve probability bounds set by polynomial moments above the transport order p.

abstract click to expand

We study the two-sample plug-in estimator of the Gaussian-smoothed Wasserstein cost $T_p^{(\sigma)}(\mu,\nu)=W_p(\mu*\gamma_\sigma,\nu*\gamma_\sigma)^p$ on $\R^d$. For fixed smoothing and finite polynomial moments $M_{q_\mu}(\mu)<\infty$, $M_{q_\nu}(\nu)<\infty$, with $q_\mu,q_\nu>p$, we establish upper bounds in probability of order $\rho_{q_\mu,p,d}(m)+\rho_{q_\nu,p,d}(n)$. Here $\rho_{q,p,d}(N)=N^{-(q-p)/(q+d)}$ for $p<q<d+2p$, $N^{-1/2}\log N$ at $q=d+2p$, and $N^{-1/2}$ for $q>d+2p$. This order also holds in expectation under $q_\mu,q_\nu\ge2p$. When the smoothed population distance is positive, the cost bound yields this rate for the distance itself. For $p>1$ and $q_\mu,q_\nu>d+2p$, we also derive a first-order expansion, a separated two-sample central limit theorem, and a sample-splitting variance estimator.

0

math.ST 2026-05-11 Recognition

Estimators achieve minimax optimal rates for unbalanced transport-growth pairs

Minimax Optimal Estimation of Transport-Growth Pairs in Unbalanced Optimal Transport

A value-based stability reduction yields matching upper and lower bounds on the estimation error.

abstract click to expand

Unbalanced optimal transport (UOT) extends classical optimal transport to measures with different total masses, but statistical guarantees for Monge-type estimation remain limited. We study unbalanced transport with quadratic cost and Kullback-Leibler marginal penalties and argue that the natural population target is not a map alone, but a transport-growth pair. Consequently, we develop two estimators for the transport-growth pairs under several setups: an optimal transport plan-based estimator for a general case, and a kernel-based estimator for a case with smooth densities. We also show that an error of the estimator achieves the minimax optimal rate by deriving a matching lower bound of the minimax risk. Our main technical contribution is a value-based stability reduction that converts perturbations of the UOT objective into transport and growth risks through a UOT gap condition. These results provide a statistical foundation for Monge-type estimation in unbalanced optimal transport.

0

math.ST 2026-05-11 Recognition

Framework links algorithm outputs to MLE for latent space networks

Bridging Theory and Practice: Statistical Inference for Latent Space Models of Networks

Adaptive criteria eliminate dependence on unknown parameters in inference guarantees

abstract click to expand

Latent space models have been widely adopted in modeling network data. Developing statistical inference for estimated model parameters enables quantifying associated uncertainty and is pivotal for downstream tasks. Despite recent progress on statistical inference of maximum likelihood estimation, crucial gaps remain between asymptotic theoretical guarantees and practical use. Specifically, how are the oracle maximum likelihood estimators related to the solutions produced by algorithms in practice? Can rigorous guarantees be established for existing algorithms without unnecessary restrictions? To address these fundamental questions, we develop a unified analytical framework that bridges theory and practice of statistical inference for latent space models. First, for the maximum likelihood estimation, we relax the spectral-multiplicity constraint in the existing asymptotic theory to broaden the applicability. Second, we overcome the dependence on unknown true parameters in prior algorithmic analyses by developing novel adaptive criteria and theoretical tools. For the widely used algorithm based on the projected gradient descent and the singular value thresholding, we explicitly connect their outputs to the maximum likelihood estimator without relying on unknown information. Our results provide a solid foundation for practically useful and statistically principled statistical inference in network analysis.

0

math.ST 2026-05-11 1 theorem

Bayesian PINNs contract to PDE solutions at near-minimax rates

Posterior Concentration of Bayesian Physics-Informed Neural Networks for Elliptic PDEs

Rate-adaptive priors enable optimal performance without knowing solution smoothness from noisy interior and boundary data.

abstract click to expand

We study the posterior contraction rate of Bayesian Physics-Informed Neural Networks (PINNs) for solving a general class of elliptic partial differential equations (PDEs). We focus on learning of the elliptic equation with a non-homogeneous Dirichlet boundary condition from independent and noisy measurements collected both inside the domain and on the boundary. Assuming that the PDE admits a strong solution in a H\"older space and using with a suitably constructed prior on the neural network weights, we prove that the posterior distribution concentrates around the exact solution at a near-minimax rate. Furthermore, the chosen prior is rate-adaptive: the posterior contracts at an (almost) optimal rate without prior knowledge of the smoothness level of the exact solution. Our results provide statistical guarantees for uncertainty quantification of PDEs via Bayesian PINNs.

0

math.ST 2026-05-11 2 theorems

LRT asymptotics stay sup of bar-chi process with unidentifiable nuisance

Asymptotics for likelihood ratio tests of boundary points with singular information and unidentifiable nuisance parameters

The limiting distribution under the null remains the same as in the regular case when singularity is produced by the nuisance parameter.

abstract click to expand

We establish the asymptotic distribution of likelihood ratio tests (LRTs) in settings where some of the nuisance parameters are unidentifiable under the null hypothesis, parameters of interest lie on the boundary of the parameter space, and the information matrix of the identifiable parameters may be singular. Our work is motivated by mixture models and genetic linkage analysis, which exhibit all three features simultaneously, but it is applicable more broadly to other problems such as change-point detection. Under suitable regularity conditions, the asymptotic distribution of the LRT statistic under the null hypothesis is the supremum of a $\bar{\chi}^2$-process, that is, a stochastic process whose marginal distributions are mixtures of $\chi^2$-distributions with weights depending on the nuisance parameter. Under local alternatives, the asymptotic distribution of the LRT statistic is the supremum of a noncentral $\bar{\chi}^2$-process, whose marginal distributions are mixtures of truncated, noncentral $\chi^2$-distributions. In contrast to prior work on singular information, where singularity stems from the parameter of interest and changes the form of the limit distribution, here singularity is determined by the nuisance parameter and the limit has the same form as in the nonsingular case. Existing results for boundary inference with nonsingular information or without nuisance parameters are obtained as special cases, and several existing application-specific results for mixture models and genetic linkage analysis are recovered and extended.

0

math.ST 2026-05-11 2 theorems

Boundary LRTs converge to supremum of bar-chi-squared process

Asymptotics for likelihood ratio tests of boundary points with singular information and unidentifiable nuisance parameters

The null distribution is the maximum of a process with chi-squared mixture margins that vary with the nuisance parameter value.

abstract click to expand

We establish the asymptotic distribution of likelihood ratio tests (LRTs) in settings where some of the nuisance parameters are unidentifiable under the null hypothesis, parameters of interest lie on the boundary of the parameter space, and the information matrix of the identifiable parameters may be singular. Our work is motivated by mixture models and genetic linkage analysis, which exhibit all three features simultaneously, but it is applicable more broadly to other problems such as change-point detection. Under suitable regularity conditions, the asymptotic distribution of the LRT statistic under the null hypothesis is the supremum of a $\bar{\chi}^2$-process, that is, a stochastic process whose marginal distributions are mixtures of $\chi^2$-distributions with weights depending on the nuisance parameter. Under local alternatives, the asymptotic distribution of the LRT statistic is the supremum of a noncentral $\bar{\chi}^2$-process, whose marginal distributions are mixtures of truncated, noncentral $\chi^2$-distributions. In contrast to prior work on singular information, where singularity stems from the parameter of interest and changes the form of the limit distribution, here singularity is determined by the nuisance parameter and the limit has the same form as in the nonsingular case. Existing results for boundary inference with nonsingular information or without nuisance parameters are obtained as special cases, and several existing application-specific results for mixture models and genetic linkage analysis are recovered and extended.

0

math.ST 2026-05-11 Recognition

Log d time recovers latent Hawkes networks

On Observation Time for Recovering Latent Hawkes Networks

Sparse weak interactions let exact inference of the interaction graph succeed from event sequences observed for only logarithmic duration.

abstract click to expand

Dynamics of interacting systems in engineering, society, and nature often evolve over latent networks that govern which entities can interact. We study the problem of inferring these networks from event-based observations, which arise naturally in finance, seismology, and neuroscience. While there is substantial algorithmic work addressing this important problem, theoretical results are scarce. In this paper we ask the following fundamental question: what is the minimum time that one must observe the dynamics in order to exactly recover the underlying network, as a function of the number $d$ of interacting entities? For a class of stationary Hawkes processes with sparse, weak interactions, we prove that an observation time of order $\log d$ is sufficient and necessary. For the upper bound we construct a two-stage estimator that uses clipped and binned event data for screening, followed by a least-squares refinement, and apply concentration bounds derived from the Poisson cluster representation. For the lower bound we combine Fano's inequality with Jacod's Girsanov formula for point processes on a suitable subclass of networks.

0

math.ST 2026-05-11 2 theorems

Joint location-scale minimization degenerates on product manifolds

Scale selection for geometric medians on product manifolds

The scale is driven to the boundary, collapsing the median to a marginal solution that discards one factor.

abstract click to expand

Geometric medians on product manifolds are sensitive to the relative scaling of factor metrics because the median objective couples the factors rather than separating them. We study this scale-selection problem and first prove that naive joint minimization over location and scale is degenerate: the scale is driven to the boundary and the problem collapses to a marginal median, effectively discarding one factor. Thus relative scale is not identifiable from the raw median loss alone. We develop three alternatives to mitigate this issue. The first treats scale as indexing a sensitivity path and establishes uniform consistency, a functional central limit theorem, and a derivative-based sensitivity measure. The second constructs a robust scale-calibrated median using marginal radial median scales, yielding unit invariance, consistency, a two-step central limit theorem, and bounded influence. The third introduces a bounded balance equation for direct scale estimation, with monotonicity, uniqueness, joint asymptotic normality, and bounded influence. Simulations illustrate boundary collapse, sensitivity, unit invariance, and balanced estimation in Euclidean and Bures-Wasserstein settings.

0

math.ST 2026-05-11 2 theorems

Susceptibility estimators consistent in singular models

Linear Response Estimators for Singular Statistical Models

Responses of observables to data perturbations yield asymptotically unbiased statistics from finite samples.

abstract click to expand

We define susceptibilities as a measure of the response of an observable quantity of a parameterized statistical model to a perturbation of the data for a general class of observables. We define estimators for these susceptibilities as statistics in a sequence of n data-points and prove that these estimators are consistent and asymptotically unbiased in the large n regime.

0

math.ST 2026-05-11 Recognition

Sinc kernel beats others for moderate-sample density estimates

Density Estimation Using the Sinc Kernel

It also shows better asymptotics for non-smooth densities and simplifies bandwidth choice.

abstract click to expand

This paper deals with the kernel density estimator based on the so-called sinc (or Fourier integral) kernel $K(x)=(\pi x)^{-1}\sin x$. We study in detail both asymptotic and finite sample properties of this estimator. It is shown that, contrary to widespread opinion, the sinc estimator is superior to other estimators in many respects: it is more accurate for quite moderate values of the sample size, has better asymptotics in non-smooth case (the density to be estimated has only first derivative), is more convenient for the bandwidth selection, etc.

0

math.ST 2026-05-11 Recognition

Runge-Kutta Langevin method hits O(d^{3/2}h^{3/2}) rate without log-concavity

Accelerating Langevin Monte Carlo via Efficient Stochastic Runge--Kutta Methods beyond Log-Concavity

Two-gradient scheme matches prior convergence guarantees for targets that are merely log-smooth rather than strongly convex.

abstract click to expand

Sampling from a high-dimensional probability distribution is a fundamental algorithmic task arising in wide-ranging applications across multiple disciplines, including scientific computing, computational statistics and machine learning. Langevin Monte Carlo (LMC) algorithms are among the most widely used sampling methods in high-dimensional settings. This paper introduces a novel higher-order and Hessian-free LMC sampling algorithm based on an efficient stochastic Runge--Kutta method of strong order $1.5$ for the overdamped Langevin dynamics. In contrast to the existing Runge--Kutta type LMC (Li et al., 2019) involved with three gradient evaluations, the newly proposed algorithm is computationally cheaper and requires only two gradient evaluations for one iteration. Under certain log-smooth conditions, non-asymptotic error bounds of the proposed algorithms are analyzed in $\mathcal{W}_2$-distance. In particular, a uniform-in-time convergence rate of order $O(d ^{\frac32} h^{\frac32})$ is derived in a non-log-concave setting, matching the convergence rate proved in the aforementioned work but under the log-concavity condition. Numerical experiments are finally presented to demonstrate the effectiveness of the new sampling algorithm.

0

math.ST 2026-05-11 Recognition

Belief functions enable inference from scarce data without probabilities

Statistical inference with belief functions: A survey

Survey compiles the main techniques for learning uncertainty measures directly from limited statistical samples.

abstract click to expand

Belief functions are a powerful and popular framework for the mathematical characterisation of uncertainty, in particular in situations in which lack of data renders learning a probability distribution for the problem impractical. The first step in a reasoning chain based on belief functions is inference: how to learn a belief measure from the available data. In this survey we focus, in particular, on making inference from statistical data, and review the most significant contributions in the area.

0

math.ST 2026-05-11 2 theorems

FHDMs match minimax rates for spherical data

Statistical Convergence of Spherical First Hitting Diffusion Models

First hitting diffusion models reach optimal total variation convergence up to logs for Sobolev distributions on spheres.

abstract click to expand

Denoising diffusion models have evolved into a state-of-the-art method for tasks in various fields, such as denoising and generation of images, text generation, or generation of synthetic data for training of other machine learning models. First hitting diffusion models (FHDM) are a particular class of denoising diffusion models with \textit{random} adaptive generation time tailored to generate data on a known manifold. Building on the conditioning framework of Doob's $h$-transform these models leverage the given information on the target data manifold to demonstrate strong performance across tasks while offering distinct features such as time-homogeneous dynamics of the generating process and a reduced average simulation time. Even though the theoretical investigation of standard forward-backward diffusion models has attracted much attention in the recent past, the statistical convergence properties of FHDMs are not yet understood. In this work, we show that, up to logarithmic factors, FHDMs achieve the minimax optimal convergence rate in total variation for spherically supported Sobolev smooth data distributions. In particular, this is the first statistical optimality result for denoising diffusion modelling with random generation time.

0

math.ST 2026-05-11 2 theorems

Self-normalized CUSUM tests remove bandwidth choices from forecast comparisons

Self-normalized tests for multistep conditional predictive ability

By normalizing loss differentials with their own cumulative sums, the tests achieve correct size in simulations without kernel or lag tuning

abstract click to expand

This paper proposes self-normalized tests for multistep conditional predictive ability in forecast comparison. By normalizing the sample mean of the transformed loss differential using functionals of its cumulative sum (CUSUM) process, specifically an adjusted-range normalizer for scalars and a matrix normalizer for vectors, our approach avoids direct estimation of the long-run covariance matrix. Consequently, it eliminates the need for the ad hoc bandwidth, kernel, and lag-truncation choices required by traditional methods. We establish the asymptotic theory for these statistics, deriving pivotal null limiting distributions and proving test consistency. Monte Carlo simulations show that the proposed tests effectively mitigate the finite-sample size distortions associated with traditional heteroskedasticity and autocorrelation consistent (HAC) methods, while retaining strong empirical power against conditional predictability alternatives.

0

math.ST 2026-05-08 2 theorems

Mixing measures contract in infinite location-scale mixtures

Convergence Rates for Latent Mixing Measures in Infinite Homoscedastic Location-Scale Mixture Models

Lower bounds connect density L1 distances to Wasserstein and operator-norm discrepancies, giving first rates for Dirichlet process models.

abstract click to expand

We study posterior contraction rates for mixing measures in homoscedastic location-scale mixture models with infinitely many components. While posterior convergence at the level of densities is well understood, ensuring convergence of the latent mixing measure is more challenging and has remained an open problem in settings where both location and scale parameters are unknown. We address this by deriving novel lower-bounds that connect the $L^1$ distance between mixture densities to discrepancies, based on the Wasserstein distances and the operator norm, between the underlying mixing measures and scale matrices. Our approach combines the dual formulation of the $W_1$ distance with functional-analytic approximation techniques. This leads to general inequalities, whose strength is determined (i) by the smoothness of the mixture kernel via the rate of decay of its characteristic function, and (ii) by a key lower-bound on the $L^1$ metric involving the operator norm discrepancy between scale parameters. Moreover, a novel PDE inversion condition yields a sharper inequality for important ordinary-smooth cases. We specialize these bounds to popular mixtures based on multivariate Gaussian, Cauchy, and Laplace kernels. As a consequence, we obtain first-of-their-kind contraction rates in the context of Dirichlet process mixtures with an unknown scale parameter shared across components. As a byproduct of our inequalities, we can distinguish the convergence behavior of the location mixing measure from that of the scale parameter across a range of kernel choices, leading to nuanced insights into their respective rates.

0

math.ST 2026-05-08

Reward early stops to make anytime tests time-sensitive

Time-sensitive anytime-valid testing

Maximizing expected reward under alternatives yields optimal e-processes for deadlines and a stationary EDO criterion.

abstract click to expand

Anytime-valid tests allow evidence to be checked during data collection: one can either continue testing or stop and reject the null while still controlling type-I error. Yet, in many applications rejection is useful only if it comes soon enough. We introduce a time-sensitive testing-by-betting framework that favours early rejection by assigning rewards to rejection times and maximising their expected value under a given alternative. This encompasses hard deadlines and softer time preferences. The resulting optimal control problem admits a Bellman representation in terms only of time and evidence against the null, rather than the full history. For hard deadlines, the simple-vs-simple case reduces to a finite-horizon Neyman--Pearson problem and identify the corresponding optimal e-process. Furthermore, we show that exponentially decaying rewards admit a stationary approximation, yielding the exponential-decay-optimal (EDO) criterion: a finite-time-scale counterpart to the classical growth-rate-optimal (GRO) viewpoint in anytime-valid statistics, with the GRO criterion recovered in the large-time-scale limit.

0

math.ST 2026-05-08

Eigenfunction rates split into sampling and grid terms

Minimax estimation of Functional Principal Components from noisy discretized functional data: the case of smooth processes

Lower bounds of order δ_ℓ/n + p^{-2α} are matched by a wavelet estimator when both smoothness and local eigengaps are controlled.

abstract click to expand

We study the minimax estimation of covariance eigenfunctions and eigenvalues in functional principal component analysis when $n$ trajectories are observed at $p$ common grid points with additive noise. We consider covariance kernels with arbitrary H\"older smoothness and no prescribed parametric decay of the eigenvalues. In this setting, kernel smoothness and local spectral separation play distinct roles: a minimax inconsistency result over the smoothness-only class shows that kernel regularity alone is not sufficient for minimax-consistent eigenfunction estimation. To capture this interplay, we introduce a class of processes that jointly controls the H\"older smoothness of the covariance kernel and a local relative inverse eigengap quantity at the target index $\ell$. Over this class, we derive non-asymptotic minimax lower bounds for eigenfunction estimation that disentangle sampling variability, discretization and spectral effects, revealing rates of order $\delta_\ell n^{-1}+p^{-2\alpha}$, where $\delta_\ell$ quantifies the spectral difficulty. We also obtain non-asymptotic lower bounds for eigenvalue estimation under a relative squared-error loss. We then construct a computable wavelet projection estimator based on Coiflet scaling functions and a quadrature scheme designed to accommodate arbitrary H\"older smoothness. For eigenfunction estimation, this estimator matches the minimax dependence on the sample size and grid resolution, up to the natural spectral factor, for any H\"older index $\alpha>0$. Finally, we show that the proposed framework covers several classical Gaussian processes and Karhunen--Lo\`eve constructions. In particular, a Karhunen--Lo\`eve based criterion links spectral decay, eigenfunction regularity and covariance-kernel smoothness, and yields controlled simulation settings illustrating the predicted phase transitions and least-favourable discretization effects.

0

math.ST 2026-05-08

Time-position preconditioner unifies mode coverage and local exploration

Time-Inhomogeneous Preconditioned Langevin Dynamics

Langevin dynamics now converge in Wasserstein-2 under time-space diffusion and only locally Lipschitz drifts.

abstract click to expand

Langevin sampling from distributions of the form $p(x) \propto \exp(-\Psi(x))$ faces two major challenges: (global) mode coverage and (local) mode exploration. The first challenge is particularly relevant for multi-modal distributions with disjoint modes, whereas the second arises when the potential $\Psi$ exhibits diverse and ill-conditioned local mode geometry. To address these challenges, a common approach is to precondition Langevin dynamics with problem-specific information, such as the sample covariance or the local curvature of $\Psi$. However, existing preconditioner choices inherently involve a trade-off between global mode coverage and local mode exploration, and no prior method resolves both simultaneously. To overcome this limitation, we propose the TIPreL, which introduces a time- and position-dependent preconditioner. This design effectively addresses both challenges mentioned above within a single framework. We establish convergence of the resulting dynamics in the Wasserstein-2 distance both in continuous time and for a tamed Euler discretization. In particular, our analysis extends the existing state of the art by proving convergence under time- and space-dependent diffusion coefficients, and only locally Lipschitz drifts, which has not been covered by prior work. Finally, we experimentally compare TIPreL with competing preconditioning schemes on a two-dimensional, severely ill-posed example and on a Bayesian logistic regression task in higher dimensions, confirming the efficiency of the proposed method.

0

math.ST 2026-05-08

Kernel gradient flows match minimax uniform rates

Optimal Confidence Band for Kernel Gradient Flow Estimator

Simultaneous confidence bands shrink at rates arbitrarily close to the theoretical minimum under standard kernel assumptions.

abstract click to expand

In this paper, we investigate the supremum-norm generalization error and the uniform inference for a specific class of kernel regression methods, namely the kernel gradient flows. Under the widely adopted capacity-source condition framework in the kernel regression literature, we first establish convergence rates for the supremum norm generalization error of both continuous and discrete kernel gradient flows under the source condition $s>\alpha_0$, where $\alpha_0\in(0,1)$ denotes the embedding index of the kernel function. Moreover, we show that these rates match the minimax optimal rates. Building on this result, we then construct simultaneous confidence bands for both continuous and discrete kernel gradient flows. Notably, the widths of the proposed confidence bands are also optimal, in the sense that their shrinkage rates are greater than, while can be arbitrarily close to, the minimax optimal rates.

0

math.ST 2026-05-07

Direct estimator gives finite-sample bounds for Schr odinger bridge drifts

Direct Estimation of Schr\"odinger Bridge Time-Series Drifts: Finite-Sample, Asymptotic, and Adaptive Guarantees

A Nadaraya-Watson plug-in on the conditional-ratio form yields uniform bounds, a CLT, and near-optimal adaptive rates.

abstract click to expand

We study nonparametric estimation of Schr\"odinger bridge (SB) drifts from i.i.d.\ data observed on a single time interval. Starting from the conditional-ratio form of the Schr\"odinger bridge time-series (SBTS) drift formula, we analyze a direct Nadaraya--Watson plug-in estimator built from kernelized numerator and denominator terms. Unlike recent SB analyses based on entropic-OT potentials, Sinkhorn iterations, or iterative bridge solvers, our approach works directly at the drift level and isolates \emph{statistical error} from optimization, approximation, and discretization error. Under H\"older regularity, a marginal-density floor, and bounded support, we prove a uniform non-asymptotic bound for admissible bandwidth pairs, a pointwise CLT under genuine undersmoothing, and an adaptive bandwidth selector satisfying an oracle inequality. We also prove a pivot-local minimax lower bound which, through an explicit uniform pivot, yields a global minimax lower bound under transparent compatibility conditions; hence the adaptive selector is minimax-rate optimal up to logarithmic factors. Synthetic experiments provide theorem-targeted diagnostics for finite-sample scaling, Gaussian approximation, and adaptive behavior.

0

math.ST 2026-05-07

Symmetrization keeps Spearman's rho but erases copula asymmetry

Concordance, symmetrization and non-exchangeability for bivariate copulas

A sharp bound shows that any asymmetric bivariate copula must exhibit positive dependence measured by the Schweizer-Wolff functional.

abstract click to expand

We study the relationship between measures of non-exchangeability $\mu_p$ ($p\in[1,+\infty]$), in the sense of Durante et al. (2010), and classical dependence functionals for bivariate copulas. We show that the symmetrization $C\mapsto(C+C^t)/2$ preserves Spearman's $\rho$ while annihilating $\mu_p$, and that Blomqvist's $\beta$ carries no information about the degree of non-exchangeability. We also establish the sharp lower bound $\sigma(C)\ge 6\,\mu_1(C)$, where $\sigma$ is the Schweizer-Wolff dependence measure, showing that asymmetry implies dependence. Closed-form expressions for $\tau$, $\rho$, and the tail-dependence coefficients of the maximally non-exchangeable family $\{M_\theta\}$ are derived as illustrations.

0

math.ST 2026-05-07

High-dimensional statistics connects to optimization and random matrices

High-Dimensional Statistics: Reflections on Progress and Open Problems

Review of two decades shows progress on complex data but flags open problems in dependency and heterogeneity.

abstract click to expand

Over the past two decades, the field of high-dimensional statistics has experienced substantial progress, driven largely by technological advances that have dramatically reduced the cost and effort for data collection and storage across a broad range of domains, including biology, medicine, astronomy, and the social and environmental sciences. Modern datasets are increasingly complex, often exhibiting rich dependency, heterogeneity, and other features that challenge traditional statistical methods. In response, high-dimensional statistics has evolved to address more sophisticated estimation and inference problems. This evolution has, in turn, fostered deep connections with and contributions to a wide range of research areas, including optimization, concentration of measure, random matrix theory, information theory, and theoretical computer science. Given the rapid pace of recent developments in high-dimensional statistics, our goal is to synthesize representative advances, highlight common themes and open problems, and point to important works that offer entry points into the field.

0

math.ST 2026-05-07

Generic kernels place models transversely to degeneracy loci

Transversality and Geometric Regularisation in Distributional Statistical Models

Rank conditions on the joint feature map Jacobian verify avoidance of non-identifiability and singular information for location families and

abstract click to expand

The distributional statistical framework replaces classical probability densities by distribution-kernel pairs $(T, \varphi)$, where $T$ is a tempered distribution and $\varphi$ is a rapidly decaying kernel. We develop the thesis that the kernel acts as a geometric regulariser, placing parametric statistical models in generic (transversal) position relative to degeneracy loci encoding non-identifiability, singular information, moment indeterminacy, and representation failure. Using the transversality theorems of Whitney, Thom, and Mather, we prove a finite-dimensional weak transversality theorem: for a generic kernel in any sufficiently rich family, the kernel-induced feature map avoids degeneracy strata of sufficiently high codimension. We establish verifiable conditions -- formulated as rank conditions on the Jacobian of the joint feature map -- under which the transversality hypothesis can be checked, and verify them for location families, the log-normal, Stein discrepancies, and graphical models. The present results apply to parametric models; extensions to semiparametric and nonparametric settings are discussed. The degeneracy classification includes representation degeneracy (Type 0) for models without closed-form densities and higher-order instabilities (Type IV) in non-chordal graphical models. Identifiability, robustness, moment determinacy, Fisher information regularity, Stein discrepancy, inferential separation, and the Behrens-Fisher problem all admit a unified geometric interpretation as transversality conditions on the feature map. This paper serves as a geometric companion to a series of papers developing the distributional framework.

1 0

0

math.ST 2026-05-06

Threshold breakdown point finds smallest contamination to alter estimators

The Threshold Breakdown Point

The measure and m-sensitivity quantify how few bad observations push M-estimators and tests past chosen deviation levels, with bootstrap for

abstract click to expand

We introduce a novel approach to finite sample robustness that avoids the pessimism of traditional breakdown analyses. We define the threshold breakdown point, the smallest contamination fraction needed to induce a prescribed deviation, and the finite sample m-sensitivity, the worst-case deviation that an estimator can incur after m observations are contaminated. We derive these measures for commonly used M-estimators, their standard errors and related test statistics. This allows us to extend the decision breakdown point of Zhang (1996) to obtain general breakdown characterizations for hypothesis testing, and show how these notions correspond to finite sample counterparts of the power and level breakdown functions of He, Simpson and Portnoy (1990). We complement our work with an inferential framework for the threshold breakdown and m-sensitivity that yields consistency and asymptotic normality results, as well as a valid multiplier bootstrap for uncertainty quantification. We illustrate the practical utility of our methods in various numerical examples and an application to a two sample testing problem for a blood pressure dataset.

0

math.ST 2026-05-06

Thinned quantile share is unconditionally feasible

Thinned Quantile Shares are Universally Feasible

For some constant c the c-thinned e^{-c} version works in every instance, removing the rainbow matching conjecture and the prior factor-two

abstract click to expand

Quantile shares, introduced by Babichenko, Feldman, Holzman, and Narayan [STOC 2024], offer an ordinal, self-maximizing, and interpretable benchmark for fair division of indivisible goods, but their universal feasibility is known only conditional on the rainbow Erd\H{o}s matching conjecture (EMC). Specifically, Babichenko et al. showed that assuming the rainbow EMC in the near-perfect matching regime, the $(1/2e)$-quantile share is universally feasible. In contrast, a simple argument shows that the $q$-quantile share can be infeasible for any $q > 1/e$. We introduce a one-parameter refinement of quantile shares, the $c$-thinned quantile share, obtained by thinning the inclusion probability in the random benchmark bundle by a factor of $c$ for a fixed constant $c\in(0,1]$. Our main result is that there exists a universal constant $c >0$ for which the $c$-thinned $e^{-c}$-quantile share is unconditionally universally feasible; this is best possible in the sense that for any $c \in (0,1]$, the $c$-thinned $q$-quantile share can be infeasible for any $q > e^{-c}$. Prior to this work, the only nontrivial share known to be universally feasible was Feige's residual maximin share. The thinning viewpoint also lets us remove the factor-two loss in the conditional result for the original quantile share: assuming the rainbow EMC, the $(1/e)$-quantile share is universally feasible.

0

math.ST 2026-05-06

Bernstein intervals deliver safe coverage and minimax widths for kernel smoothers

Empirical Bernstein Confidence Intervals for Kernel Smoothers: A Safe and Sharp Way to Exhaust Assumed Smoothness

Direct tail bounds keep bias on its natural scale, so the intervals hit nominal coverage uniformly over the smoothness class while shrinking

abstract click to expand

Using normal approximation (NA) to construct confidence intervals for kernel smoothers faces a fundamental challenge: the normalization that produces a limiting distribution also magnifies smoothing bias, so that a small estimation bias may become a non-negligible inferential bias. Robust bias correction (RBC) and bias-aware inference (BA) address this difficulty through different bias-control strategies. This paper takes a different route by replacing the normal-approximation calibration engine with empirical Bernstein tail control. The resulting confidence intervals control stochastic fluctuations on the original estimation scale, so that deterministic smoothing bias enters the radius as an estimation-scale approximation error rather than as a normalized inferential bias. We develop this idea for pointwise inference on univariate density and regression functions. The proposed empirical Bernstein confidence intervals (EBCIs) combine empirical Bernstein calibration with bias-aware fixed-length radius construction under a local Taylor-remainder class. Uniformly over functions with $S$-th order local smoothness, both one-sided and two-sided intervals attain the nominal coverage level up to a remainder of order $n^{-\frac{2S}{2S+1}}$, or an exponential remainder in bounded or sub-Gaussian settings. Their widths shrink at the minimax rate $n^{-\frac{S}{2S+1}}$. Thus, EBCI safely converts correctly specified smoothness into both coverage accuracy and interval-length efficiency. The contribution is not a new bias-control philosophy, but a new calibration engine that can inherit existing ideas such as BA and RBC while avoiding the usual normalization-induced amplification of smoothing bias.

0

math.ST 2026-05-06 Recognition

Bernstein bounds give kernel smoothers safe minimax intervals

Empirical Bernstein Confidence Intervals for Kernel Smoothers: A Safe and Sharp Way to Exhaust Assumed Smoothness

Controlling fluctuations on the raw scale lets the intervals reach nominal coverage up to small remainders and shrink at the optimal rate n^

abstract click to expand

Using normal approximation (NA) to construct confidence intervals for kernel smoothers faces a fundamental challenge: the normalization that produces a limiting distribution also magnifies smoothing bias, so that a small estimation bias may become a non-negligible inferential bias. Robust bias correction (RBC) and bias-aware inference (BA) address this difficulty through different bias-control strategies. This paper takes a different route by replacing the normal-approximation calibration engine with empirical Bernstein tail control. The resulting confidence intervals control stochastic fluctuations on the original estimation scale, so that deterministic smoothing bias enters the radius as an estimation-scale approximation error rather than as a normalized inferential bias. We develop this idea for pointwise inference on univariate density and regression functions. The proposed empirical Bernstein confidence intervals (EBCIs) combine empirical Bernstein calibration with bias-aware fixed-length radius construction under a local Taylor-remainder class. Uniformly over functions with $S$-th order local smoothness, both one-sided and two-sided intervals attain the nominal coverage level up to a remainder of order $n^{-\frac{2S}{2S+1}}$, or an exponential remainder in bounded or sub-Gaussian settings. Their widths shrink at the minimax rate $n^{-\frac{S}{2S+1}}$. Thus, EBCI safely converts correctly specified smoothness into both coverage accuracy and interval-length efficiency. The contribution is not a new bias-control philosophy, but a new calibration engine that can inherit existing ideas such as BA and RBC while avoiding the usual normalization-induced amplification of smoothing bias.

0

math.ST 2026-05-06

Ordering constraint improves estimates of scale powers in exponentials

Improved estimation of positive powers of scale parameters of exponential distributions under a prior information

Dominating estimators for positive powers of scales are derived under convex loss when one parameter is known to be smaller than the other.

abstract click to expand

Estimating unknown parameters subject to prior constraints is important in statistical inference, particularly in fields such as reliability analysis, survival studies, and engineering, where prior structural information about the parameters is often available. Incorporating such prior information makes the analysis more realistic and usually yields better estimates than methods that ignore such information. In this article, we consider the problem of estimating the positive power of the scale parameter of a two-shifted exponential population under a prior ordering constraint on scale parameters. We derive sufficient conditions under which equivariant estimators are shown to dominate others under scale-invariant strictly convex loss functions. In addition, we derived various estimators that dominate the best affine equivariant estimators (BAEE). Moreover, we derive a smooth estimator which dominates the BAEE using an integrated approach, and we further show that it is a generalized Bayes estimator under a non-informative prior. We also provide an improved estimator based on the Pitman closeness criterion. An extensive simulation study has been done for computational purposes. Finally, we provided real examples to implement the results.

0

math.ST 2026-05-06

Spike rate parameters estimated efficiently in large neuron networks

LAN property for the parameter of the jump rate in mean field interacting systems of neurons

Local asymptotic normality of the likelihood makes the maximum likelihood estimator optimal as the number of neurons grows without bound.

abstract click to expand

In the context of a large system of $N$ neurons interacting through spike events in a mean-field regime as $N\rightarrow \infty$, we characterize the estimation of a multidimensional parameter in the spiking rate, when the neural states are observed over a fixed time horizon. We first prove the local asymptotic normality (LAN) property and leverage classical theory to establish the asymptotic efficiency of the maximum likelihood estimator. While the theory of Ibragimov and Hasminski yields strong results, up to global asymptotic minimax bound, its applicability appears currently limited to models without state resets at spike times. Following then H\"{o}pfner's classical approach, we nevertheless derive, in a general setting including neuron reset, the consistency, asymptotic normality and local asymptotic minimax optimality of the estimator. Keywords: Local Asymptotic Normality (LAN); Mean-field regime; Interacting particle system; Multidimensional parameter estimation; Jump rate estimation; Maximum likelihood estimator (MLE); Asymptotic minimax optimality

0

math.ST 2026-05-06

Shrinking partitions yield asymptotic normality for jump process rates

Local estimation of transition rates of jump processes through discretization

Occurrence/exposure rates become normally distributed for Markov and semi-Markov models when bin sizes decrease with sample size and no form

abstract click to expand

We investigate the Poisson regression method for Markov and semi-Markov jump processes from a nonparametric angle, allowing the lengths of the time and duration intervals in the partition to vary with the number of observations. Imposing no structural assumptions on the true intensities, we obtain asymptotic normality of the occurence/exposure rates under appropriate shrinking conditions on the partition lengths. We derive asymptotic normality results for both Markov and semi-Markov models using only classical central limit theorems and elementary results for counting processes. All results are illustrated on both simulated and real data.

0

math.ST 2026-05-06

T-posterior produces randomised estimators with non-asymptotic bounds

Statistical Inference via T-Posterior Randomised Estimators

The construction yields performance guarantees that survive model misspecification and require no concentration inequalities.

abstract click to expand

Given a statistical model, we propose a novel estimation method that yields randomised estimators for the unknown distribution of an observed random variable. We establish non-asymptotic bounds for the performance of these estimators and demonstrate their robustness to potential model misspecification. Notably, these properties are established by circumventing the use of concentration inequalities and empirical process theory. We provide an illustration of this approach to the problem of estimating the intensity of a Poisson process.

0

math.ST 2026-05-06

Gamma smoothing yields optimal rates and shorter EB intervals for Poisson data

Poisson Empirical Bayes via Gamma-Smoothed Nonparametric Maximum Likelihood

Mixture approximation fixes discreteness and slow convergence, delivering exact coverage and reduced expected length.

abstract click to expand

Empirical Bayes methods are widely used for large-scale estimation and inference in the Poisson means problem. Existing results establish theoretical properties of the nonparametric maximum likelihood estimator (NPMLE) for optimal posterior mean estimation, but comparatively less is known about uncertainty quantification (i.e., construction of confidence sets). Two main challenges in constructing confidence sets for the latent parameters based on the NPMLE are its discreteness and its slow rate of prior estimation. We resolve these limitations by introducing a smooth NPMLE that models the prior as a Gamma mixture, which is a flexible class capable of approximating a wide range of continuous priors on $(0,\infty)$. This procedure preserves the convex optimization structure of the classical NPMLE. The smooth NPMLE achieves the optimal nearly parametric rate for posterior mean estimation. Moreover, it achieves a polynomial convergence rate for prior and posterior density estimation under a compact support assumption on the mixing distribution. Based on the smooth NPMLE, we construct plug-in empirical Bayes confidence sets that mimic the oracle optimal (in terms of expected length) marginal coverage sets. We show theoretically and empirically that these sets achieve asymptotically exact marginal coverage and are substantially shorter than existing methods.

0

math.ST 2026-05-06

Smoothness improves rates for Wasserstein barycenters

Smoothed estimation of Wasserstein barycenters

A semi-dual Sobolev approach yields nonparametric convergence that depends on smoothness rather than full dimensionality.

abstract click to expand

This paper studies the statistical estimation of exact Wasserstein barycenters. Existing non-asymptotic results for empirical barycenters exhibit a severe curse of dimensionality. Motivated by the semi-dual formulation of the barycenter problem and its associated Sobolev optimization geometry, we develop a smoothness-aware approach that combines density estimation with Sobolev geometric structure to estimate the population barycenter. We establish nonparametric convergence rates for estimating both the barycenter functional and its minimizer, demonstrating how smoothness can substantially improve statistical performance.

0

math.ST 2026-05-06

Population covariates with survey data identify small-area treatment effects

Causal Small Area Estimation with Survey-only Covariates

Doubly robust estimator is consistent if either outcome or treatment model is correct, removing the need to observe treatment for every unit

abstract click to expand

Area-specific causal inference is important in many policy and survey applications, where the goal is to evaluate treatment effects for small geographic or demographic domains. Existing causal small area estimation methods, however, typically rely on a strong data requirement that treatment status is observed for all units in the population. This assumption is often unrealistic in practical survey settings, where both treatment and outcome variables are observed only for sampled units, while auxiliary covariates are available for the full population. To address this limitation, we develop a new identification strategy for area-specific treatment effects under this more realistic data structure by combining survey-only covariates with population-level auxiliary information. Based on this result, we propose a doubly robust estimator that remains consistent when either the outcome regression model or the treatment and area assignment models are correctly specified. We further derive the semiparametric efficiency bound for the target parameter and show that the proposed estimator attains this bound under regularity conditions. Simulation studies demonstrate favorable finite-sample performance, particularly in settings with small sample sizes within areas, and an empirical application illustrates the practical relevance of the proposed framework.

0

math.ST 2026-05-05

Uncountably many inaccessible decisions exist in every finite probability space

Uncountably many conditionally inaccessible decisions exist in every finite probability space

Subjective p blocks recovery of objectively good choices for uncountably many p* and utility pairs

abstract click to expand

In a recent paper \cite{Redei-Jing2026} the notion of conditional $p$-inaccessibility of a decision based on utility maximization was defined and examples of conditionally $p$-inaccessible decisions were given. The conditional inaccessibility of a decision based on maximizing utility calculated by a probability measure $p^*$ expresses that the decision cannot be obtained if the expectation values of the utility functions are calculated using the (Jeffrey) conditional probability measure obtained by conditioning $p$ on partial evidence about the probability $p^*$ that determines the decision. The paper \cite{Redei-Jing2026} conjectured that conditionally $p$-inaccessible decisions exist in some probability spaces having arbitrary large finite number of elementary events. In this paper we prove that for any $p$ in any finite probability space there exist an uncountable number of probability measures $p^*$ for each of which there exist an uncountable number of pairs of utility functions that represent conditionally $p$-inaccessible decisions. If $p^*$ is an objective probability determining objectively good decisions and $p$ is the subjective probability determining a rational decision of a decision making Agent, the result says that there is an enormous number of decision situations in which the Agent's subjective probability prohibits the Agent's informed rational decision to be objectively good.

0

math.ST 2026-05-05

Schur-concave commutative copulas equal closure of associative copula hull

Characterizing Schur-concave commutative copulas as the closure of associative ones

The uniform limits of mixtures of associative copulas recover exactly the full class of Schur-concave commutative copulas.

abstract click to expand

Let $\mathcal{C}_a$ denote the class of associative copulas, and let $\overline{\mathcal{C}}_a$ be the closure, in the uniform metric $d_\infty$, of the convex hull of $\mathcal{C}_a$ . It is known that $\mathcal{C}_a \subseteq \mathcal{C}_{SC}$, the class of Schur-concave commutative copulas. We prove the reverse inclusion, establishing $\overline{\mathcal{C}}_a = \mathcal{C}_{SC}$.

0

math.ST 2026-05-05 1 theorem

Schur-concave copulas equal closure of associative convex hulls

Characterizing Schur-concave commutative copulas as the closure of associative ones

Convex combinations of associative copulas become dense in the full Schur-concave commutative class under uniform limits.

abstract click to expand

Let $\mathcal{C}_a$ denote the class of associative copulas, and let $\overline{\mathcal{C}}_a$ be the closure, in the uniform metric $d_\infty$, of the convex hull of $\mathcal{C}_a$ . It is known that $\mathcal{C}_a \subseteq \mathcal{C}_{SC}$, the class of Schur-concave commutative copulas. We prove the reverse inclusion, establishing $\overline{\mathcal{C}}_a = \mathcal{C}_{SC}$.

0

math.ST 2026-05-05

Uniform representation yields simultaneous confidence regions for time series

Simultaneous Inference for Nonlinear Time Series, a Sieve M-regression Approach

New high-dimensional theory for dependent observations turns pointwise sieve estimators into simultaneous inference tools.

abstract click to expand

This paper studies simultaneous inference of conditional distributions in nonlinear time series from a sieve M-regression perspective. Existing literature on sieve M-regression has primarily focused on pointwise asymptotics, leaving the development of uncertainty quantification over the entire predictor space unexplored. We address this gap by establishing a uniform Bahadur representation for the sieve M-estimator, accommodating dependent data and a growing number of sieve basis functions. A novel high-dimensional empirical process theory is developed for temporally dependent data, and a specifically designed M-decomposition method is utilized to control high-dimensional complexities. Building on this representation, we develop a convex Gaussian approximation to characterize the asymptotic behavior of the estimator and construct valid simultaneous confidence regions (SCRs). To facilitate practical implementation, we introduce a self-convolved bootstrap algorithm that accurately approximates the distribution of the maximal deviation. Our inferential framework is supported by rigorous error bounds and validated through numerical simulations and real data applications.

0

math.ST 2026-05-05

Moments of group functions computed from Fourier coefficients alone

Statistics of a multi-factor function from its Fourier transform

Each moment expands into products of exactly m coefficients whose indices sum to zero, acting as a natural filter on contributing terms.

abstract click to expand

For a phenomenon $\boldsymbol{f}$ that is a function of $n$ factors, defined on a finite abelian group $G$, we derive its population statistics solely from its Fourier transform $\hat{\boldsymbol{f}}$. Our main result is an \textit{$m$-Coefficient/Index Annihilation Theorem}: the $m$th moment of $\boldsymbol{f}$ becomes a series of terms, each with precisely $m$ Fourier coefficients --- and surprisingly, the coefficient \textit{indices} in each term sum to zero under group addition. This condition acts like a filter, limiting which terms appear in the Fourier domain, and can reveal deeper relationships between the variables driving $\boldsymbol{f}$. These techniques can also be used as an analytical/design tool, or as a feasibility constraint in search algorithms. For functions defined on $\mathbb{Z}_2^n$, we show how the skew, kurtosis, etc. of a binomial distribution can be derived from the Fourier domain. Several other examples are presented.

0

math.ST 2026-05-04 3 theorems

Risk parameter unifies MML with NML minimax coding

Entropic Strict Minimum Message Length and Its Connections to PAC-Bayes and NML

Entropic SMML interpolates Bayesian and worst-case regimes with transitions on a logarithmic scale in n and the risk level for regular model

abstract click to expand

We introduce entropic strict minimum message length (SMML), a risk-sensitive generalization of strict minimum message length coding. The proposed criterion replaces expected two-part codelength under the prior predictive distribution with an exponential certainty equivalent, thereby defining a one-parameter family of coding rules that interpolates between Bayesian average-case coding and worst-case minimax coding. We show that ordinary SMML is recovered in the risk-neutral limit, while the extreme risk-sensitive limit yields a minimax codelength criterion; when centered by the oracle maximum likelihood codelength, this criterion coincides with the normalized maximum likelihood (NML) minimax-regret principle. We further prove that entropic SMML admits a variational characterization as a Kullback--Leibler-regularized worst-case expected codelength, giving it a PAC--Bayes-type interpretation. We establish a joint asymptotic theory linking the sample size $n$ and the risk parameter $\tau$, showing that in regular parametric models the transition between Bayesian, robust, and minimax coding regimes occurs on a logarithmic scale. For regular exponential families, the fixed-codebook partition remains affine in sufficient-statistic space, while the codepoints satisfy a tilted moment-matching condition and admit an interpretation as tilted Bregman centroids. These results position entropic SMML as an information-theoretic bridge between MML, PAC--Bayes, and MDL.

0

math.ST 2026-05-04 3 theorems

Polynomial approximation controls regret by Hellinger distance in Gaussian empirical Bayes

Sharp regret-Hellinger bounds for Gaussian empirical Bayes via polynomial approximation

For compact priors the unregularized rule reaches O(ε² log(1/ε)/log log(1/ε)) without extra log factors or regularization.

abstract click to expand

A central problem in the theory of empirical Bayes is to control the regret (excess risk) of a learned Bayes rule by the Hellinger distance between the estimated and true marginal densities. In the normal means model, the classical result of Jiang and Zhang (2009, Annals of Statistics) achieves this only after regularizing the Bayes rule and incurs an extraneous cubic logarithmic factor through a delicate recursive argument. This paper introduces a new technique, based on polynomial approximation and Bernstein-type inequalities for weighted $L_2$ norms, that bounds the unregularized regret directly. The method is conceptually simpler and yields sharper, sometimes optimal, regret bounds. For compactly supported priors, we prove the sharp bound that the regret is at most $O(\epsilon^2 \log(1/\epsilon)/\log\log(1/\epsilon))$, where $\epsilon$ is the Hellinger distance between the marginal densities. The same method also extends to priors with exponential tails. Conversely, we show that regularization is genuinely necessary for heavy-tailed priors under only bounded moment assumptions. As a statistical consequence, we obtain improved regret bounds for the nonparametric maximum likelihood estimator.

0

math.ST 2026-05-04

Quantile TVD solutions form exact minmax intervals at each point

An Exact Pointwise Characterization for Total Variation Denoising in Quantile Regression

Admissible fitted values at any location are bounded by minmax functionals of local order statistics over nested intervals, yielding non-

abstract click to expand

Total variation denoising (TVD) is a classical method for denoising and curve fitting, yet an explicit pointwise description of its fitted values has only recently been established in the mean regression setting by arXiv:2410.03041v4. This raises the question of whether a similar representation holds for quantile regression. We answer this question affirmatively by deriving an exact minmax/maxmin representation for the quantile TVD estimator, providing a complete pointwise characterization of its solution set. Given that the quantile TVD estimator is generally non-unique, the existence of such a representation is perhaps surprising. We show that the set of admissible fitted values at any location forms a compact interval, whose endpoints are characterized exactly by minmax/maxmin functionals of local order statistics over nested intervals. We next develop several structural properties of the quantile TVD solution set. First, the solution set is closed under coordinatewise maximum and minimum, guaranteeing the existence of extremal elements -- upper and lower envelope solutions. Second, this reveals that quantile TVD is intrinsically non-crossing across quantile levels when a common tuning parameter is used. We prove this is driven by submodularity of the total variation penalty, and show that any penalized quantile regression estimator with a submodular penalty enjoys this property. From an estimation error perspective, our representation enables a refined pointwise analysis via a transparent local bias-variance decomposition, facilitating new pointwise risk bounds and near-optimal rates for locally Holder smooth functions. Our results hold under heavy-tailed noise (e.g., Cauchy) and substantially extend existing guarantees beyond locally constant signals. Altogether, these results advance the theory of quantile TV regression via exact pointwise min-max representations.

0

math.ST 2026-05-04

Long memory switched systems exhibit bursts despite average stability

Intermittency induced by long memory under stochastic regime switching

Volterra equations with Markov-modulated kernels show annealed stability but quenched intermittency and pathwise growth.

abstract click to expand

We study a fundamental instability mechanism in nonlinear, nonlocal dynamical systems arising from the interaction of long-range memory and stochastic regime switching. The dynamics are governed by network-coupled, operator-valued Volterra evolutions with completely monotone memory kernels whose excitation operators and kernel parameters are modulated by an ergodic finite-state continuous-time Markov chain. We formalize a sharp separation between annealed stability (in expectation) and quenched behaviour (along typical sample paths). On the annealed side, we identify an averaged memory gain that yields uniform moment bounds and a memory-adapted Lyapunov functional implying mean-square control under an averaged subcriticality condition. On the quenched side, we show that rare but persistent excursions into supercritical regimes are amplified by memory, producing intermittent macroscopic bursts with heavy-tailed statistics and a deterministic almost sure growth exponent obtained via a subadditive ergodic argument. This establishes an annealed--quenched dichotomy specific to non-Markovian switching systems, where stability in expectation can coexist with pathwise growth and metastable burst phases. We further derive a micro--macro correspondence by proving that a population of regime-modulated self-exciting point processes converges, both annealed and quenched, to the random-coefficient Volterra limit, transferring the burst mechanism from microscopic branching dynamics to macroscopic long-memory flows. Numerical experiments illustrate how burst localization depends on graph geometry and on noncommuting excitation operators.

0

math.ST 2026-05-04

Bootstrap adapts to four clustering regimes for valid tests

Bootstrap Inference under General Two-way Clustering with Serially and Spatially Dependent Common Effects

A data-driven classifier and projection wild bootstrap handle serial and spatial dependence without knowing the regime in advance.

abstract click to expand

This paper develops bootstrap procedures for inference in linear regression models with two-way clustered data. We characterize the estimator's asymptotic behavior in five mutually exclusive and exhaustive regimes: three Gaussian and two non-Gaussian. We establish four impossibility results: heterogeneous score components preclude uniform consistency; uniform consistency also fails in one non-Gaussian (infeasible) regime; the infeasible regime is not uniformly distinguishable from a feasible one; and uniform validity over all feasible regimes rules out uniform conservativeness over the infeasible regime. To address the feasible regimes, we propose a data-driven regime classifier and a projection-based wild bootstrap procedure. The procedure delivers uniformly valid inference across the four feasible regimes while allowing serial dependence along the second clustering dimension and spatial dependence along the first. This combination of regime adaptivity and flexible dependence is new to the two-way clustering literature. Monte Carlo simulations confirm the accuracy and flexibility of the proposed methods in settings with complex clustering structures.

0

math.ST 2026-05-04

Profile estimator attains minimax bound for hyperbolic location inference

Profile Likelihood Inference for Anisotropic Hyperbolic Wrapped Normal Models on Hyperbolic Space

The method yields consistent and efficient estimates for the location parameter that match the local asymptotic lower bound under geodesic

abstract click to expand

We study likelihood-based inference for the anisotropic hyperbolic wrapped normal distribution on standard hyperbolic space. The model has a manifold-valued location parameter and a full positive definite covariance matrix in tangent coordinates. For independent observations from this family, we analyze the profile maximum likelihood estimator obtained by optimizing the likelihood over the location after profiling out the covariance. To guarantee finite-sample existence, we formulate the estimator on a covariance shell that bounds eigenvalues away from zero and infinity. We prove that this constrained likelihood is well posed, that the anisotropic wrapped normal family is identifiable, and that the estimator is strongly consistent when the true covariance lies in the interior of the shell. In global normal coordinates for the location and log-covariance coordinates for the nuisance parameter, we establish joint asymptotic normality and derive efficient profile inference for the location parameter through the Schur-complement information. We further prove local asymptotic normality of the experiment and obtain the H\'ajek--Le Cam local asymptotic minimax lower bound under squared geodesic loss. The profile estimator attains this bound for truncated squared loss, and for ordinary squared loss under a uniform-integrability condition. We also give an explicit computational form of the estimator based on spectral clipping of the empirical tangent covariance, and present a Monte Carlo calibration study showing that the finite-sample scaled location risk and Wald coverage agree with the asymptotic theory.

0

math.ST 2026-05-01

Bivariate mixtures let MLE beat square-root convergence

A Simple Bivariate Example of Fast Convergence Rates for Maximum Likelihood Estimates

A one-parameter family achieves any regularly varying rate with index above 0.5 and some faster index-0.5 rates for the location estimator.

abstract click to expand

We present a one-parameter family of bivariate absolutely continuous distributions based on location-scale family of variance Gaussian mixtures, with continuous densities with the same support (effective domain). The maximum likelihood estimation of the location parameter converges to the true value faster than the classic square root rate. In fact, we can obtain any convergence rate given by a regularly varying function with index greater than 0.5, and some convergence rates given by regularly varying functions with index 0.5 but faster than the classic square root rate.

0

math.ST 2026-05-01

Decoupled descent forces train error to match test error

Decoupled Descent: Exact Test Error Tracking Via Approximate Message Passing

By canceling data reuse biases with approximate message passing, the method lets training error track test error exactly in Gaussian mixture

abstract click to expand

In modern parametric model training, full-batch gradient descent (and its variants) suffers due to progressively stronger biasing towards the exact realization of training data; this drives the systematic ``generalization gap'', where the train error becomes an unreliable proxy for test error. Existing approaches either argue this gap is benign through complex analysis or sacrifice data to a validation set. In contrast, we introduce decoupled descent (DD), a novel theory-based training algorithm that satisfies a train-test identity -- enforcing the train error to asymptotically track the test error for stylized Gaussian mixture models. Within this specific regime, leveraging approximate message passing theory, DD iteratively cancels the biases due to data reuse, rigorously demonstrating the feasibility of zero-cost validation and $100\%$ data utilization. Moreover, DD is governed by a low-dimensional state evolution recursion, rendering the dynamics of the algorithm transparent and tractable. We validate DD on XOR classification, yielding superior performance compared to GD; additionally, we implement noisy MNIST and non-linear probing of CIFAR-10, demonstrating that even when our stylized assumptions are relaxed, DD narrows the generalization gap compared to GD.

0

math.ST 2026-05-01

Wild bootstrap max-test detects high-dimensional predictors

A High Dimensional Wild Bootstrap Max-Test for Detecting the Presence of Significant Predictors

Approximates the max distribution for dependent heterogeneous data with p exponential in n and power against weak signals.

abstract click to expand

We construct a block bootstrap max-test for detecting the presence of significant predictors in a high dimensional setting, allowing for weakly dependent and heterogeneous (possibly non-stationary) data. The number of covariates to be screened may be large $p$ $>>$ $n$, and growing at an exponential rate, provided $\ln (p)$ $=$ $o(n^{a})$ for some $a$ $>$ $0$ that depends on memory decay and the growth of higher moments. We study the problem of correlation screening in a high dimensional marginal regression setting, assuming so-called \textit{physical dependence} in a time series setting. We entirely sidestep covariance matrix estimation and adaptive re-sampling by working with a max-statistic over the many computed parameters. Thus we do not need endogenous selection of the most relevant predictor index yielding non-uniform asymptotics, nor do we need a post-estimation Bonferroni correction. The non-standard limit distribution arising from the maximum of an increasing number of estimators is easily approximated by a multiplier (wild) block bootstrap. The max-test controls for size well, performs well against various deviations from the null, including very slight deviations with a weak or sparse signal. A numerical experiment is performed and an empirical example with the VIX volatility index is provided.

0

math.ST 2026-05-01

New quarticity estimator achieves stable CLT at rate 1/sqrt(Δ_n)

A note on estimation of quarticity based on spot volatility

The spot-volatility based method converges at the expected rate and permits direct asymptotic variance comparisons to prior estimators.

abstract click to expand

In this paper, we aim at estimating the quarticity of continuous It\^{o} semimartingales. Instead of using some classical estimators, we introduce a more intuitive one and establish a central limit theorem (CLT) for it, with a convergence rate of $1/\sqrt{\Delta_n}$ in the sense of stable convergence. Moreover, we compare the asymptotic variance of this estimator with that of other existing estimators.

0

math.ST 2026-05-01

Bayesian one-pass algorithm attains optimal posterior convergence

The Bernstein-von Mises theorem for Bayesian one-pass online learning

An online Bernstein-von Mises theorem ensures valid uncertainty without growing batches and matches batch estimator performance on GLMs.

abstract click to expand

Bayesian online learning provides a coherent framework for sequential inference. However, its theoretical understanding remains limited, particularly in the one-pass setting. Existing theoretical guarantees typically require the mini-batch sample size to diverge, a condition that fails in the one-pass regime. In this paper, we propose a new Bayesian online learning algorithm tailored to the one-pass setting, which incorporates a warm-start phase to ensure stable sequential updates. For this algorithm, we show that the sequentially updated posterior attains the optimal convergence rate. Building on this, we establish an online analogue of the Bernstein-von Mises theorem, which guarantees valid uncertainty quantification without diverging mini-batch sample sizes. Our analysis is based on a novel theoretical framework that differs fundamentally from existing approaches in the online learning literature. Numerical experiments on generalized linear models show that the proposed method matches the performance of the batch estimator while outperforming existing online procedures.

0

math.ST 2026-04-30

Tilted density scores equal original scores at shifted location and time

Technical Note on Relating Scores of Tilted Distributions

Linear tilts shift location only; constant negative diagonal tilts also adjust the noise level of the underlying convolution.

abstract click to expand

Recent results have shown that for a linear tilt to a reference measure, the scores that would be produced under convolution with a normal variable can be expressed in terms of convolutions of the original density. Here, we extend that result to include constant negative diagonal tilts as well. The relationship follows from relating the denoisers of the two densities, which define the scores via Tweedie formula. A linear tilt results in a location shift to the score operator, while a quadratic tilt results in both a location shift and a time shift. Thus the scores of the tilted density can be understood as the scores of the original convolution process at a different location and noise level. These results are of interest to those in the score based diffusion community, and may lead to better score estimators which take advantage of these tilted score relationships.

0

math.ST 2026-04-30

Black-box selection leaks at most the total variation distance to conditional law

A Leakage Bound for Confidence Sets after Black-Box Selection

The excess noncoverage after any arbitrary selection is bounded by the information the selection carries about the inferential data.

abstract click to expand

In many analyses the object reported at the end is not fixed in advance, but is chosen after a preliminary search over variables, subgroups, transformations, models or contrasts. Classical selective-inference methods are most effective when this search can be written as an explicit selection event. This note treats the less structured case in which the selection rule is a black box and inference is required for the target indexed by the selected object. We show that, for any fixed-target confidence procedure, selected-target noncoverage is bounded by the nominal fixed-target noncoverage plus the average total variation distance between the marginal law of the inferential data and its conditional law given the selected object. A mutual-information bound follows immediately. The result recovers sample splitting as the zero-leakage case and gives explicit guarantees for noisy screening through a Gaussian information bound. Thus the inferential cost of black-box selection is quantified by the information that the selected object carries about the inferential sample.

0

math.ST 2026-04-29

This paper derives limiting eigenvalue locations and eigenvector projections for scaled…

Asymptotics of ultra-high-dimensional generalized spiked sample covariance matrix

Under the spiked covariance model with p ≍ n^α (α > 1), the properly scaled sample covariance matrix has eigenvalue locations and…

abstract click to expand

This paper investigates the asymptotics of eigenstructure of sample covariance matrix under the spiked covariance matrix model in ultra-high-dimensional settings, where the dimensionality can grow much faster than the sample size with $ p \asymp n^{\alpha} $, $ \alpha > 1 $. We establish the first-order convergence limits of eigenvalue locations and eigenvector projections of properly scaled sample covariance matrix. Our results are extensions of \cite{bloemendal16,ding21}.

0

math.ST 2026-04-29

Adaptive CIs in two-groups model scale as σ(n^{-1/4} + ε term)

Adaptive Confidence Intervals in Efron's Gaussian Two-Groups Model

Unknown contamination forces polynomially longer intervals than the known-ε case, with a poly-time Fourier algorithm achieving the bound for

abstract click to expand

Robust uncertainty quantification is increasingly important in modern data analysis and is often formalized under Huber's model, which allows an $\varepsilon$-fraction of arbitrary corruptions. In many experimental sciences, however, the measurement protocol is well controlled, and contamination is more plausibly introduced upstream. Motivated by this noise-oblivious nature of adversaries, we study confidence intervals for the null location parameter $\theta$ in Efron's Gaussian two-groups model, where an unknown fraction $\varepsilon$ of observations have arbitrarily shifted means, but all samples share the same law of additive Gaussian measurement noise with variance $\sigma^2$. We characterize the minimax-optimal length among confidence intervals with a prescribed coverage level uniformly over the unknown contamination proportion and all noise-oblivious adversaries. Although prior work has shown that the minimax point estimation rate of theta does not deteriorate when $\varepsilon$ becomes unknown, our results reveal that, with a given $\sigma^2$, the minimax-optimal length of confidence intervals that are adaptive to unknown $\varepsilon$ is of order $\sigma (n^{-1/4}+\varepsilon^{1/2}/\max\{1, \log(en \varepsilon^2)\}^{1/2})$, which is polynomially worse than the optimal length when $\varepsilon$ is known. When the variance $\sigma^2$ is also unknown, we show a further degradation: no adaptive confidence interval can be shorter than $\Omega(\sigma n^{-1/8})$. Algorithmically, we introduce a Fourier-based certification procedure built on Carath\'{e}odory's positive-semidefiniteness constraints. By scanning candidate points and accepting those whose residual characteristic function is certifiably consistent with a Gaussian location mixture, our algorithm attains the minimax lower bound in the known-variance setting and is computable in polynomial time.

0

math.ST 2026-04-29

Geometric records yield consistent MLE for Pareto tail index

Estimating the tail index of Pareto-type distributions from geometric records

The estimator is strongly consistent and asymptotically normal, matches classical methods in simulations, and uses far fewer full readings.

abstract click to expand

In this paper we develop a novel inferential approach based on geometric records for estimating the tail index of heavy-tailed distributions. We construct a maximum likelihood estimator for the Pareto model and establish its strong consistency and asymptotic normality, providing also an explicit expression for its asymptotic variance. These results are then extended to a broad class of Pareto-type distributions. The performance of the estimator is assessed via Monte Carlo simulation and compared with classical estimators from the literature. The proposed method is particularly well suited for settings where data arrive sequentially, as it yields smooth estimation trajectories. It is also especially advantageous in applications such as destructive testing, where measuring each observation exactly is costly. In this context, the estimator clearly outperforms Hill's estimator, achieving comparable or better accuracy while requiring a substantially smaller number of measured observations. An application to the analysis of the distribution of fluctuations of the Dow Jones Industrial Average (DJI) is also presented.

0

math.ST 2026-04-29

Rates for Markov kernels bound variances of Lipschitz functions

Implications of weak convergence rates of Markov transition kernels

In the reversible case with a Lipschitz initial density the same rates also control chi-squared divergence and support central limit results

abstract click to expand

This article extends weak convergence bounds of Markov transition kernels to convergence bounds on the variance of the Markov kernel applied to Lipschitz functions. In the reversible case, weak convergence rates of the transition kernels imply chi-squared divergence convergence bounds if the density of the initialization measure is Lipschitz. These results provide new tools to establish central limit theorems for Lipschitz functions used in Markov chain Monte Carlo simulations. Applications are explored to the stability of Metropolis-Hastings algorithms in high dimensions, stochastic gradient descent, and solutions to stochastic delay equations.

0

math.ST 2026-04-29

Test detects constant volatility at minimax-optimal rate

Sharp adaptive nonparametric testing for constant volatility

Deviations are measured by the ratio of local volatility to its L2 average and the procedure adapts to unknown smoothness under infill data.

abstract click to expand

Based on discrete observations, we develop a test to infer if the volatility function $\sigma(\cdot)$ within the nonparametric Gaussian white noise model $dY_t = \sigma(t)dW_t$ is constant. The testing procedure is shown to be minimax-optimal and adaptive for infill asymptotics and these results entail that a deviation from the null hypothesis of constancy is best measured in terms of the ratio of $\sigma(t)$ and its $L^2$-average. The derivation of optimal constants requires the construction of hypotheses with height $h(b)$, where the parameter $b$ solves $F_n(b)=0$ for given functions $F_n$. Proving this equation to be solvable for each $n\in\mathbb{N}$ and establishing quantitative bounds of the solutions is built upon the implicit function theorem.

0

math.ST 2026-04-29

Diffusion threshold MLE converges at n^{-(1+γ)/2} rate

Self-organized regime switching in null-recurrent dynamics

Profile estimator for regime switch in null-recurrent dynamics has minimax optimal rate with limit depending on local time of oscillating BM

abstract click to expand

Based on discrete observations $X_0,X_{\Delta},\dots, X_{n\Delta}$ for $\Delta=n^{-\gamma}$ with $\gamma\in [0,1)$ of the null-recurrent dynamic $dX_t = \sigma(X_t)dW_t$ with a Brownian motion $W$ and $\sigma(x)=\alpha\mathbb{1}\{x<\rho\} + \beta\mathbb{1}\{x\geq \rho\}$, we derive rate of convergence and limiting distribution of the profile MLE for $\rho$. This includes low-frequency asymptotics ($\gamma=0$) for which the observations form a null-recurrent Markov chain. The derived non-standard limit is the argsup over a doubly stochastic drifted Poisson process explicitly involving the local time of oscillating Brownian motion. Its dependence on $\rho$ as well as the unknown volatility levels $\alpha$ and $\beta$ is shown to be continuous w.r.t. the topology of weak convergence, enabling statistical inference. Whereas this limit is independent of the sampling frequency, the profile MLE's rate of convergence equals $n^{-(1+\gamma)/2}$ and is proven to be minimax optimal. The surprising idea of the proof of the limit theorem is to relate the long-term behavior of the null-recurrent Markov chain to the infill asymptotics on a fixed time interval. Indeed, in the very special case that $(X_t)_{t\geq 0}$ is started in the true parameter $X_0=\rho_0$, the process $(X_t-\rho_0)_{t\geq 0}$ is shown to possess a desirable distributional self-similarity. On basis of the strong Markov property, the artificial constallation of starting in $\rho_0$ is finally overcome by a coupling argument.

0

math.ST 2026-04-29

Optimal Kelly wealth growth equals bipolar KL limit

The optimal betting wealth growth rate

The rate is achievable against any i.i.d. null and is often strictly smaller than the usual infimum KL divergence.

abstract click to expand

This paper characterizes the best possible rate of growth of wealth in a Kelly betting game when repeatedly betting against a general i.i.d. null hypothesis $\mathscr{P}$, but the data are drawn i.i.d from an arbitrary alternative $Q$. We prove that it equals $\lim_{n \to \infty}n^{-1}\inf_{P \in (\mathscr P)^n)^{\circ\circ}} \mathrm{KL}(Q^n,P)$, where ${\mathscr P}^n = \{P^n: P \in \mathscr{P}\}$ and $(\mathscr {P}^n)^{\circ\circ}$ is its bipolar, i.e., this rate is achievable and one cannot do better. This quantity is in general smaller than a more popular quantity in the literature, $\mathrm{KL}_{\inf}(Q,\mathscr{P}) := \inf_{P \in \mathscr P}\mathrm{KL}(Q,P)$. If $\mathrm{KL}_{\mathrm{inf}}(\cdot,\mathscr P)$ is weakly lowersemicontinuous (w.l.s.c.) at $Q$, we show that the two quantities are equal; in particular, this happens when $\mathscr P$ is weakly compact. For simple alternatives, we provide the first matching necessary and sufficient condition for when power-one sequential tests exist (without assumptions on $\mathscr P, Q$). We also derive the optimal worst-case growth rate against composite $\mathscr Q$. We emphasize that test supermartingales on reduced filtrations suffice for all i.i.d. testing problems, and more general e-processes are not required. We thus completely generalize the recent results of Larsson et al.~\cite{larsson2025numeraire} to the sequential setting.

0

math.ST 2026-04-28

Large deviation principle proven for estimators in moderate deviation zone

Parametric Statistical Inference in the Zone of Moderate Deviation Probabilities

Hellinger-distance Taylor expansions deliver uniform approximations and posterior concentration for Bayesian and likelihood methods.

abstract click to expand

A parametric theory of statistical inference is developed for the moderate deviation probability zone. The new approach to the proofs is based on the Taylor series expansion of the logarithm of the likelihood ratio based on the Hellinger distance. The Large Deviation Principle in the moderate deviation probability zone is proven for Bayesian estimators and maximum likelihood estimators. A uniform approximation of the logarithm of the likelihood ratio and Theorem on concentration of the posterior Bayesian measure are also established for the zone of moderate deviation probabilities.

0

math.ST 2026-04-28

EMMB estimator uncovers latent intercept shifts without group labels

Robust linear regression under latent group heterogeneity

Two-step method for regression with uncertain means and variances outperforms OLS in simulations and Beijing air quality data.

abstract click to expand

Uncertainty is ubiquitous in real-world data, and the assumptions underlying classical linear regression models are often violated in practice. Inspired by the theory of sublinear expectation, we consider a linear regression model where the random intercept term has mean uncertainty and the error term has variance uncertainty. We develop a novel two-step approach, named Expectation-Maximization with Moving Block (EMMB), to estimate the model parameters. The proposed method requires no prior knowledge of group structures or change points. Theoretical properties of the estimators are established under mild regularity conditions. Simulation studies and a real-data application to PM2.5 concentration modeling in Beijing demonstrate the superiority of the proposed method: it captures substantial intercept heterogeneity overlooked by ordinary least squares and yields more accurate and interpretable estimates.

0

math.ST 2026-04-28

Odds ratio posterior invariant to margin fix iff coefficients sum to zero

Posterior Invariance of Multiplicative Contrasts under Margin Constraints in Contingency Tables

The result holds for generalized multiplicative contrasts under Bayesian models with prior independence between marginal and conditional par

abstract click to expand

Measures of association in contingency tables, such as odds ratios and their generalizations, are often studied under different sampling schemes that either fix or leave random the margins of the table. While classical results show that certain odds ratios are unaffected by constraining the margins, it is less clear when this invariance holds more generally. This paper studies posterior inference for a broad class of multiplicative contrasts of multinomial cell probabilities, which we refer to as generalized odds ratios, and addresses exactly when fixing a margin alters inference about them. We consider Bayesian inference under multinomial sampling and under models in which partition sums of the table are fixed in advance, and assume that the marginal and conditional parameters are independent a priori. Under additional mild assumptions, we show that the posterior distribution of a generalized odds ratio is invariant to fixing a margin if and only if the coefficients defining the contrast sum to zero within the margin.

0

math.ST 2026-04-28

Generator weights from triangular inversion characterize signed tail compatibility

A Geometric Witness Framework for Signed Multivariate Tail-Dependence Compatibility: Asymptotic Structure and Finite-Threshold Synthesis

Complete signed tail families are parametrized linearly and realized at every admissible finite scale precisely when the recovered weights,

abstract click to expand

We study multivariate tail-dependence compatibility for complete and partial signed tail families, treating lower-tail, upper-tail, and mixed configurations in one geometric witness representation indexed by active coordinate sets and sign patterns. For a complete signed tail family, witness generator weights w = (w_{I,sigma}) give a linear incidence parametrization and are recovered by explicit triangular inversion. Excluding the geometric scale p0, the complete case uses 3^d - 1 generator weights, matching the number of complete signed tail coefficients; for partial specifications, only selected target coefficients need be prescribed. At a fixed threshold p0 in (0, 1/2), the inversion identifies the normalized noncentral ternary cell masses of any realizing copula. Hence finite-threshold compatibility is characterized by nonnegative recovered generator weights, singleton normalization, and the residual central-mass constraint. This yields a complete Moebius-type synthesis within the witness framework. If the recovered increments are nonnegative and singleton normalization holds, then S(w) = sum(w) determines the admissible finite-scale range, and every admissible p0 gives an exact witness realization. In the canonical ray geometry, such a realization preserves the same complete signed tail family throughout 0 < p <= p0. Thus the primary object is the complete signed tail family lambda: it is realized at every admissible finite scale and can be carried along families of witness copulas with p0 decreasing to 0. Partial, noisy, or inconsistent specifications are treated through linear-feasibility and weighted-l1 recovery problems in the same parametrization. The representation separates the p0-free incidence/Moebius layer from finite-threshold realization and provides tools for realization, simulation, calibration, completion, repair, and scenario design.

0