stat.ME — Pith

0

stat.ME 2026-05-13 Recognition

Surrogates estimate time-dependent failure probabilities efficiently

Time-variant reliability using time-dependent surrogate models

mNARX and F-NARX with biased designs capture tail responses in stochastic dynamical systems for first-passage analysis.

abstract click to expand

Time-variant reliability analysis is a critical task for ensuring the safety of engineering dynamical systems subjected to stochastic excitations. However, assessing failure probability for realistic systems with Monte-Carlo simulation-based methods is often computationally intractable due to the high cost of the underlying models and the large number of simulations required. While surrogate models such as polynomial chaos expansions or Kriging are well-established for time-invariant reliability problems, their direct application to time-dependent systems remains challenging. This chapter introduces two advanced surrogate modeling frameworks designed specifically for dynamical systems: manifold-NARX (mNARX) and functional NARX (F-NARX). The mNARX approach constructs the surrogate on a reduced-order manifold of auxiliary state variables, enabling the efficient handling of high-dimensional inputs by embedding physical insight into a regression formulation. Conversely, the F-NARX framework exploits the functional nature of system trajectories, extracting principal component features from continuous time windows to mitigate issues associated with discrete lag selection and long-memory effects. We demonstrate the efficacy of these methods on two benchmark reliability problems: a stochastic quarter-car model and a hysteretic Bouc-Wen oscillator. The results highlight that, when combined with suitably biased experimental designs, both frameworks accurately capture the tail behavior of the system response, enabling precise and efficient estimation of first-passage probabilities.

0

stat.ME 2026-05-13 2 theorems

Mixed-frequency synthetic controls reach optimal prediction error

Synthetic Control Method with Mixed Frequency Data

The estimator minimizes squared error for counterfactuals while delivering confidence intervals for average treatment effects.

abstract click to expand

Mixed-frequency data, where variables are observed at different temporal resolutions, commonly occur in economic and financial studies. Classical synthetic control methods (SCM) are ill-suited for such data, often necessitating aggregation or prefiltering that may discard valuable information. This paper proposes a novel Mixed-Frequency Synthetic Control Method (MF-SCM) to integrate mixed-frequency data into the synthetic control framework effectively. We develop a flexible estimation procedure to construct synthetic control weights under mixed-frequency settings and establish the theoretical properties of the MF-SCM estimator. Specifically, we first prove that the estimator achieves asymptotic optimality, in the sense that it achieves the lowest possible squared prediction error among all potential treatment effect estimators from averaging outcomes of control units. We then derive the asymptotic distribution of the average treatment effect (ATE) estimator using projection theory and construct confidence intervals for the ATE estimator. The method's effectiveness is demonstrated through numerical simulations and two empirical applications concerning the 2017 Tax Cuts and jobs Act in US and air pollution alerts.

0

stat.ME 2026-05-13 2 theorems

Prior evidence boosts power in sequential multiple testing

Informative Simultaneous Confidence Intervals for Graphical Group Sequential Test Procedures

A graphical test strategy adjusts current significance levels using past repeated p-values but decides on current evidence alone, enabling

abstract click to expand

Test procedures for multiple hypotheses in a group sequential clinical trial that control the family-wise error rate are considered. Several graphical group sequential tests suggested in the literature, which are special cases of Bonferroni-closure tests, are discussed. The focus is on the question of whether to consider at the current stage only the evidence of the current repeated p-value or the evidence over all repeated p-values from the previous stages. A new test strategy controlling the family-wise error rate is introduced that consistently works across all hypotheses, with the evidence (i.e., repeated p-value) from the current stage. The strategy is more powerful than similar previously suggested test procedures. This is achieved by using the evidence from previous stages to increase the significance levels. For the test procedures, corresponding compatible simultaneous confidence intervals are presented, having the disadvantage of often not providing additional information on the treatment effects. For this reason, we extend previous work about informative simultaneous confidence intervals for one-stage graphical tests to graphical group sequential trials. Iterative algorithms are introduced that calculate these informative bounds that have a small power loss compared to the original graphical group sequential test. The boundaries can be calculated after each stage. In addition, previous work is extended by a criterion to estimate the accuracy of the numerically calculated boundaries. The suggested informative bounds can be used to provide median-conservative, i.e., reliable estimators, for estimating the treatment effects in a group sequential test with multiple hypotheses.

0

stat.ME 2026-05-13 Recognition

Bayesian model links realized volatility to prices for better forecasts

Bayesian Dynamic Modeling of Realized Volatility in Financial Asset Price Forecasting

Dynamic gamma process for intraday volatility feeds into price models to capture leverage effects and outperforms standard approaches in S&P

abstract click to expand

We present a new class of Bayesian dynamic models for bivariate price-realized volatility time series in financial forecasting. A novel dynamic gamma process model adopted for realized volatility is integrated with traditional Bayesian dynamic linear models (DLMs) for asset price series. This represents reduced-form volatility leverage and feedback effects through use of realized volatility proxies in conditional DLMs for prices or returns, coupled with the synthesis of higher frequency data to track and anticipate volatility fluctuations. Analysis is computationally straightforward, extending conjugate-form Bayesian analyses for sequential filtering and model monitoring with simple and direct simulation for forecasting. A main applied setting is equity return forecasting with daily prices and realized volatility from high-frequency, intraday data. Detailed empirical studies of multiple S&P sector ETFs highlight the improvements achievable in asset price forecasting relative to standard models and deliver contextual insights on the nature and practical relevance of volatility leverage and feedback effects. The analytic structure and negligible extra computational cost will enable scaling to higher dimensions for multivariate price series forecasting for decouple/recouple portfolio construction and risk management applications.

0

stat.ME 2026-05-13 Recognition

Laplacian-P-splines yield fast Gamma frailty fits for clustered survival

Laplacian-P-splines for shared Gamma frailty models applied to clustered right-censored time-to-event data

Analytical gradient and Hessian replace MCMC, giving parameter estimates and uncertainty for shared frailty models without sampling.

abstract click to expand

Shared frailty models have been proposed to accommodate unmeasured cluster-specific risk factors through the inclusion of a common latent frailty term. Among possible frailty distributions, the Gamma distribution is appealing due to its non-negativity, flexibility, and algebraic tractability leading to closed-form marginal survival or hazard function expressions. Under the Bayesian paradigm, the posterior distributions of model parameters are usually explored with computationally intensive procedures relying on Markov chain Monte Carlo sampling. As an alternative, Laplacian-P-splines (LPS) provide a flexible and sampling-free alternative by relying on Gaussian approximations of the posterior target distributions. In this model class, analytical formulas are obtained for the gradient and Hessian, yielding a computationally efficient inference scheme for estimation of model parameters with a natural way of quantifying uncertainty. This article extends the LPS toolbox to the inclusion of shared Gamma frailty models for clustered time-to-event data. We assess the finite-sample performance of the LPS estimation procedure through an extensive simulation study and compare estimates with those obtained using penalized partial likelihood estimation, without specification of the baseline hazard, and with the variance of the frailty term being estimated using profile likelihood. Finally, the proposed LPS estimation method is exemplified using three publicly available biomedical datasets on: (i) recurrent infections in children, (ii) cancer prevention, and (iii) kidney transplantation.

0

stat.ME 2026-05-13 2 theorems

No single test dominates power across multivariate problems

Power Studies For Two-Sample and Goodness-of-Fit Methods For Multivariate Data

Simulations show every method fails in some cases, so a small complementary set is recommended instead.

abstract click to expand

We present the results of a large number of simulation studies regarding the power of various goodness-of-fit as well as non-parametric two-sample tests for multivariate data. In two dimensions this includes both continuous and discrete data, in higher dimensions continuous data only. In general no single method can be relied upon to provide good power, any one method may be quite good for some combination of null hypothesis and alternative and may fail badly for another. Based on the results of these studies we propose a fairly small number of methods chosen such that for any of the case studies included here at least one of the methods has good power. The studies were carried out using the R packages MD2sample and MDgof, available from CRAN.

0

stat.ME 2026-05-13 2 theorems

Bayesian mixture clusters mixed health outcomes with low-rank regressions

Bayesian low-rank latent-cluster regression for mixed health outcomes

Posterior contraction holds for surfaces and mean shifts, with consistent partition recovery under separation.

abstract click to expand

High-dimensional health and surveillance studies often involve many collinear predictors, multiple correlated outcomes of different types, and latent heterogeneity across observational units. We propose a Bayesian latent-cluster reduced-rank regression model for multivariate mixed outcomes. The model is a finite mixture of regression surfaces: each latent cluster has a cluster-specific mean shift and a low-rank coefficient matrix, yielding simultaneous clustering, dimension reduction, and component-wise interpretability. Response coordinates may be Gaussian, Bernoulli, or negative binomial. Multiplicative gamma process shrinkage adapts the effective rank within each cluster, and a WAIC-based criterion is used to tune the number of clusters and the nominal maximal rank. We establish posterior contraction for the identifiable component-specific regression surfaces and mean shifts, up to label permutation, and derive corresponding contraction for predictor-side singular subspaces. We also analyze the default label-invariant reporting pipeline based on the posterior similarity matrix: an eigenspace embedding followed by mean shift is shown to consistently recover the latent partition under an additional strong separation margin. Simulation experiments spanning all-Gaussian, all-Bernoulli, all-negative-binomial, and mixed Gaussian--Bernoulli--negative-binomial regimes show accurate recovery of the number of clusters and competitive clustering performance against $K$-means, mclust, PCA-based clustering, and a Gaussian reduced-rank mixture benchmark. We illustrate the method in three applications that show how the model separates individual-level utilization groups and produces interpretable county- and state-level cluster maps together with response-specific posterior predictive maps.

0

stat.ME 2026-05-13 1 theorem

Counterfactual probability identifies root causes from data

Probability of Root Cause: A Counterfactual Definition and Its Identification

New measure gives the chance a variable set is the true origin of an outcome, conditional on evidence and under standard assumptions.

abstract click to expand

Attributing an observed outcome to its root cause is a central task in domains ranging from medical diagnosis to engineering fault diagnosis. Existing approaches either equate the root cause with a root node of the causal graph, as in causal-discovery-based root cause analysis, or target causes more broadly and thereby favour proximate ones, as with the probability of causation and posterior causal effects. We argue that this issue stems from the absence of a formal definition of a root cause, which has led to methods designed for other purposes being applied to root cause attribution by default. We address this by giving a formal, individual-level definition of a root cause within the potential outcomes framework, based on the notion of an individual cause and a counterfactual root condition motivated by mediation analysis. Building on this definition, we propose the probability of root cause (PRC), which quantifies how probable it is that a candidate variable set is the root cause of a given outcome, conditional on observed evidence. Under standard assumptions, we establish the identifiability of the PRC and derive an explicit identification formula. Two numerical examples illustrate the approach.

0

stat.ME 2026-05-13 2 theorems

Nontargeted HPV infections isolate vaccine direct immune effect

Using NonTargeted HPV Infections in Studies with Risk Compensation

In observational data with possible behavior changes, the method removes confounding and mediation to reveal pure immunological protection.

abstract click to expand

Studies of HPV vaccine efficacy usually record infections with vaccine targeted and nontargeted strains. Contrary to blinded randomized controlled trials, confounding bias can be a threat and risk compensation may occur in observational studies. Etievant et al. (Biometrics, 2023) proposed to use cervical infections with nontargeted HPV strains to reduce or remove confounding bias of estimates of vaccine efficacy on targeted strains. However, they assumed that vaccinated women could not change their behavior after vaccination. We consider a more plausible setting where unmeasured sexual behavior acts as both a confounder and a mediator, and investigate if the quantity estimated in practice with their method has a clear causal meaning. We demonstrate that using nontargeted HPV infections can remove both confounding bias and the portion of the vaccine effect on the targeted HPV strains that is mediated through the change of behavior. In that case, the estimated quantity has a clear causal interpretation as it represents the direct immunological effect of the vaccine. However, it could be considered misleading from a public health perspective, as in the presence of risk compensation it would suggest higher protection than what women effectively experience. An unblinded randomized controlled trial would allow estimation of the total causal effect of the vaccine, and infections with nontargeted HPV strains could then be used to isolate the indirect behavioral effect of the vaccine.

0

stat.ME 2026-05-13 2 theorems

Local clr LIMA detects composition mark clusters better than global averages

Uncovering Local Heterogeneity: Local Summary Characteristics for Spatial Point Processes with Composition-Valued Marks

New point-specific functions for spatial processes uncover local economic clusters and drainage effects hidden from global metrics

abstract click to expand

Traditional analysis of marked spatial point processes often relies on global summary statistics, which tend to obscure local spatial heterogeneity by averaging dependencies across the entire observation window. To overcome this limitation, this paper introduces a framework for Local Indicators of Mark Association (LIMA) specifically designed for composition-valued marks. Such marks, characterized by their non-negative components and sum-to-constant constraint, require a specialized treatment within the Aitchison geometry. By employing log-ratio transformations, we project these constrained marks into a Euclidean space, enabling the point-specific decomposition of global mark characteristics. The efficacy of the proposed clr-based LIMA functions is validated through extensive simulation studies. The results demonstrate a superior capacity to detect localized mark clusters, achieving detection accuracies consistently higher than their global counterparts. The practical utility of this framework is demonstrated using an empirical dataset of economic sector compositions in Castile-La Mancha, Spain. The analysis uncovers latent economic clustering patterns and localized \textit{drainage} effects that are invisible to global metrics, providing granular insights into regional spatial dynamics. Our findings suggest that the extended LIMA framework serves as a vital diagnostic tool for high-dimensional, non-stationary marked point patterns.

0

stat.ME 2026-05-13 Recognition

Unified theory supplies non-asymptotic bounds on conditional conformal errors

A Unified Theory of Conditional Coverage in Conformal Prediction with Applications

Pointwise and L_p routes clarify error sources and enable extensions to shifts and structured data.

abstract click to expand

Conformal prediction provides finite-sample marginal validity, but many applications require coverage that adapts to heterogeneous test points or subpopulations. Existing methods for conditional coverage are largely analyzed case by case, leaving limited general theory for how asymptotic conditional validity arises, how different procedures should be compared, and how such guarantees extend to structured data. We develop a unified framework and theory for conformal methods targeting conditional coverage. Within this framework, we derive non-asymptotic bounds for conditional miscoverage through two complementary routes: a pointwise route for direct score control and an $L_p$ route for quantile-centered methods. The theory clarifies the error sources governing asymptotic conditional validity, yields a common interpretation of existing methods, and supports applications and extensions to conditional-coverage-oriented model selection, localization under covariate shift, structured-data settings through a weighted symmetry-based formulation and more. Numerical results support the theoretical conclusions.

0

stat.ME 2026-05-13 Recognition

Model-matched designs raise accuracy in plant selection trials

The design of selection experiments using a model-based approach

Aligning layouts with the linear mixed model later used for analysis improves genetic gain from fixed plot resources.

abstract click to expand

Plant breeding programs use data obtained from multi-environment selection experiments to produce improved varieties with the ultimate aim of maintaining high levels of genetic gain. Selection accuracy can be improved with the use of advanced statistical analytical methods that use informative and parsimonious variance models for the set of genotype by environment interaction effects, include information on genetic relatedness and appropriately accommodate non-genetic sources of variation within the framework of a single step estimation and prediction algorithm. Maximal gains from using these advanced techniques are more likely to be achieved if the designs used match the aims of the selection experiment and make full use of the available resources. In this paper we present an approach for constructing designs for selection experiments which are optimal or near optimal against a robust and sensible linear mixed model. This model reflects the models used for analysis. The approach is flexible and introduces an additional step to accommodate efficient resource allocation of replication status to genotypes, which is undertaken prior to the allocation of plots to genotypes. A motivating example is used to illustrate the approach, two illustrative examples are presented one each for single and multiple environment selection experiments and several in-silico simulation studies are used to demonstrate the advantages of these approaches.

0

stat.ME 2026-05-13 2 theorems

Graph independences sharpen causal effect bounds

Exploiting independence constraints for efficient estimation of bounds on causal effects in the presence of unmeasured confounding

Projecting influence functions onto independence constraints reduces variance in sensitivity analysis bounds when unmeasured confounding is

abstract click to expand

Causal graphs may inform covariate adjustment for estimating causal effects and improve estimation efficiency by exploiting the graphical structure. In many applications, however, the target causal parameter may not be point-identified due to the presence of unmeasured confounding. Sensitivity analysis methods address this challenge by characterizing bounds on the causal parameter under varying assumptions about the magnitude or form of unmeasured confounding. We focus on semiparametric efficient estimation of causal effects in non-identifiable settings, assuming a known (or hypothesized) causal graph. We propose an influence function projection approach that exploits the conditional independence constraints implied by the graph to improve the efficiency of semiparametric estimators of upper and lower bounds on the average causal effect under a given sensitivity analysis model. Our approach applies across multiple sensitivity analysis frameworks and causal estimands, thereby connecting knowledge of graphical structure with the sensitivity analysis literature. We illustrate our approach through simulations and real data examples thought to be affected by unmeasured confounding, including the effect of labor training program on post-intervention earnings, and the effect of low ejection fraction on heart failure death.

0

stat.ME 2026-05-13 2 theorems

Copula fixes dependence parameter to identify ordinal causal effects

Causal inference with ordinal outcomes: copula-based identification, estimation and sensitivity analysis

Treating association strength as sensitivity yields doubly robust point estimates whose curves stay inside sharp bounds.

abstract click to expand

In causal inference with ordinal outcomes, several interpretable estimands are functions of the probability that the potential outcome under one treatment is larger than that under another treatment for the same unit. This probability depends on the joint distribution of both potential outcomes and is generally not identifiable. Existing work has focused on sharp bounds of this probability based on partial identification, but bounds are often too wide to be informative. We propose a copula-based method that links the identifiable marginal distributions of the potential outcomes via a parametric copula, treating the copula association parameter as a sensitivity parameter. With a fixed copula parameter, the estimands become identified functionals of the observed data. Working under unconfoundedness, we derive the efficient influence function in the nonparametric model and construct one-step estimators that accommodate flexible nuisance estimation. The resulting procedure is rate-doubly-robust and attains the semiparametric efficiency bound under standard conditions. Varying the copula parameter yields a sensitivity curve with point-wise confidence bands that typically lie within the sharp bounds, providing an interpretable bridge between partial identification and point estimation. We further provide a comprehensive sensitivity analysis with respect to both the copula specification and the unconfoundedness assumption. We develop an associated R package \texttt{ordinalCI}.

0

stat.ME 2026-05-12 Recognition

Inferring temperature improves hyperbolic models of tree-like networks

Hyperbolic Latent Space Models for Network Embedding: Model Specification and Bayesian Inference

The sharpness parameter controls hierarchy depth; fixing it ahead of time reduces the model's ability to reconstruct observed networks.

abstract click to expand

Many real-world networks exhibit hierarchical, tree-like structure and heavy-tailed degree distributions, phenomena not readily captured by standard statistical models for network data. Extensions of the popular continuous latent space modeling framework have been proposed to accommodate such networks. Drawing on insights from statistical physics, continuous latent space models with underlying hyperbolic geometry have been proposed as a natural framework, probabilistically embedding nodes in a latent Riemannian manifold with constant negative curvature. Most statistical implementations, however, simplify the original physics-based model by omitting the ``temperature parameter," which controls the sharpness of the latent distance-to-probability mapping. We argue this omission is critical. We demonstrate that temperature is the fundamental parameter governing a network's tree-like topology, and that failing to infer it weakens model expressiveness. We formalize a Bayesian hyperbolic continuous latent space model with an unknown, learnable temperature parameter. We then develop two inferential procedures: a Hamiltonian Monte Carlo approach for rigorous posterior characterization and a scalable auto-encoding variational Bayes algorithm for large-scale networks. Through simulation and real data examples, we show that our model outperforms models with fixed temperature and misspecified Euclidean geometries in graph reconstruction tasks in most settings, confirming temperature is a crucial and inferable feature of complex networks.

0

stat.ME 2026-05-12 Recognition

One operator unifies all regression types via measure choice

Unified Operator Framework for Functional and Multivariate Regression

Scalar, functional, and multivariate models become special cases; discretization reduces exactly to classical multivariate regression.

abstract click to expand

We develop a unified operator framework for scalar, multivariate, and functional regression based on integral operators defined with respect to general measures. Within this framework, classical regression models, including scalar-on-function, function-on-scalar, function-on-function, and multivariate multiple regression, arise as special cases corresponding to different choices of input and output measures. We establish three main results. First, we show that the standard regression taxonomy can be expressed as a single operator under varying measures. Second, we demonstrate that discrete representations correspond to exact operator evaluations under discrete measures and converge to the continuous operator as the observation grid is refined. Third, we show that estimation under the discrete-measure formulation reduces to standard multivariate regression, with statistical properties governed by classical results. A simulation study illustrates these principles, highlighting the roles of discretization, conditioning, and estimation. Overall, the proposed framework clarifies the relationship between functional and multivariate regression and provides a meaningful interpretation of discretized modeling approaches as operator estimation under different measure specifications. This perspective also explains why vectorized multivariate regression is often competitive with functional methods in linear settings: it directly estimates the discrete-measure representation of the underlying operator.

0

stat.ME 2026-05-12 2 theorems

Similarity subgroups unmask hidden performance gaps in external validation

Rethinking external validation for the target population: Capturing patient-level similarity with a generative model

Autoencoders group external patients by alignment to development data and show when overall results hide subgroup successes or failures.

abstract click to expand

Background: External validation is essential for assessing the transportability of predictive models. However, its interpretation is often confounded by differences between external and development populations. This study introduces a framework to distinguish model deficiencies from case-mix effects. Method: We propose a framework that quantifies each external patient's similarity to the development data and measures performance in subgroups with varying levels of alignment to the development distribution. We use generative models, specifically autoencoders, to estimate similarity, offering a more flexible alternative to traditional linear approaches and enabling validation without sharing the original development data. The utility of autoencoder-based similarity measure is demonstrated using synthetic data, and the framework's application is illustrated using data from the Netherlands Heart Registration (NHR) to predict mortality after transcatheter aortic valve implantation. Results: Our framework revealed substantial variation in model performance across similarity-defined subgroups, differences that remain hidden under conventional external validation yet can meaningfully alter conclusions. In several settings, conventional external validation suggested poor overall performance. However, after accounting for differences in patient characteristics, for some sub-groups, the model performance was consistent with internal validation results. Conversely, apparently acceptable overall performance could mask clinically relevant performance deficits in specific subgroups. Conclusion: The proposed framework enhances the interpretability of external validation by linking model performance to population alignment with the development data. This provides a more principled basis for deciding whether a model is transportable and to which patients it can be safely applied.

0

stat.ME 2026-05-12 Recognition

VPR with mean-field predictives matches exact posteriors

Variational predictive resampling

Iterative imputation from variational predictives closes the asymptotic gap left by direct mean-field VI in Gaussian models.

abstract click to expand

Bayesian inference provides principled uncertainty quantification, but accurate posterior sampling with MCMC can be computationally prohibitive for modern applications. Variational inference (VI) offers a scalable alternative and often yields accurate predictive distributions, but cheap variational families such as mean-field (MF) can produce over-concentrated approximations that miss posterior dependence. We propose variational predictive resampling (VPR), a scalable posterior sampling method that exploits VI's predictive strength within a predictive-resampling framework to better approximate the Bayesian posterior. Given a prior-likelihood pair, VPR repeatedly imputes future observations from the current variational predictive, updates the variational approximation after each imputation, and records the parameter value implied by the completed sample. We establish conditions under which the law of the parameter returned by VPR is well defined and show that its finite-horizon approximation converges to this limit. In a tractable Gaussian location model, we show that VPR with MF variational predictives converges to the exact Bayesian posterior, whereas the optimal MF-VI approximation retains a non-vanishing asymptotic gap. Experiments on linear regression, logistic regression, and hierarchical linear mixed-effects models demonstrate that VPR substantially improves posterior uncertainty quantification and recovers posterior dependence missed by MF-VI, while remaining computationally competitive with, and often more efficient than, MCMC.

0

stat.ME 2026-05-12 2 theorems

Indirect comparisons gain reliability when methods fit evidence strength

Indirect Comparisons For Health Technology Assessment: A Practical Methodological Guide And Tips With Insights From The French Transparency Commission

French authority insights guide choices for network analyses and adjusted comparisons when direct trials are missing.

abstract click to expand

Context: Indirect treatment comparisons (ITC) are essential when direct head-to-head evidence is unavailable. Their reliability depends on rigorous methodological choices and careful assessment of underlying assumptions. Appropriate methodological choices can help address challenges such as cross-country variations in treatment practices, ethical constraints, and evolving treatment landscapes during trial conduct. This opinion and perspective paper provides practical guidance to strengthen the quality, robustness and accuracy of ITCs in the context of health technology assessment (HTA) in France. Methods: A panel of experts in ITCs and French market access environment developed the present strategic guidance, informed by previous work reviewing HTA methodological guidelines and complemented by a systematic review of Transparency Committee opinions from the French National Authority for Health (HAS). Results: Key considerations include early anticipation of ITCs, justification of potential confounding factors, and rigorous assessment of similarity and transitivity in randomized trial-based comparisons. In network meta-analysis, the structure of the evidence network should be adapted to the specific decision context. Population-Adjusted Indirect Comparisons require careful reporting and interpretation of the effective sample size. When evidence relies on non-randomized clinical trials, comparisons between single-arm studies and external control arms may be appropriate under different scenarios, depending on the feasibility of conducting subsequent randomized studies. Conclusions: Robust and reliable ITCs require methods consistent with the validity of their assumptions and the strength of the available evidence. This practical guidance supports the development of rigorous ITCs to inform decision-making in complex medical contexts where direct comparisons are not feasible.

0

stat.ME 2026-05-12 Recognition

Quantiles of AR errors recovered from observed series

Estimation of the Risk Measure under a Nuisance Autoregression

R-estimators of coefficients and autoregression quantiles enable estimation of risk measures for unobservable increments Z.

abstract click to expand

The goal of an experiment is to evaluate the profit, loss, or the amount of a physical entity over a period. The measurements $X_t$ can be influenced by the values measured in the past; hence we describe the situation with an autoregression model, whose autoregression coefficients are generally unknown. The variable of interest is the error term $Z_t$ of the model, which is the increment of $X_t$ with respect to the past, but itself unobservable. The problem is to estimate various quantile functions of $Z$, as the risk measure of the loss or the related economic indicators. We construct an estimate of quantile functions of $Z$ in the situation that the inference is possible only by means of observations $X$. The proposed estimates are based on the R-estimators of autoregression coefficients, combined with the autoregression quantiles.

0

stat.ME 2026-05-12 2 theorems

This paper introduces two new stability measures for Bayesian decisions under prior…

Robust Bayes Acts under Prior Perturbations: Contamination, Stability, and Selection Paths

In finite problems, these enable cost-adjusted selection paths that balance stability against selection costs under prior uncertainty.

abstract click to expand

This paper develops a quantitative framework to assess the robustness of Bayes-optimal decisions in finite decision problems under model uncertainty. We introduce two complementary stability notions for acts: the robustness radius, measuring the largest perturbation of a reference prior under which an act remains Bayes-optimal, and the contamination need, quantifying the minimal perturbation required for an act to become Bayes-optimal under some nearby prior. Both concepts are characterized via linear programming formulations and computed efficiently using bisection methods exploiting monotonicity properties. Building on these stability measures, we propose a cost-adjusted stability criterion that integrates robustness considerations with act-specific selection costs, yielding a parametric family of decision rules indexed by a regularization parameter. We analyze how optimal act selection evolves along this parameter and derive selection paths that reveal structural transitions between stability-driven and cost-driven regimes. The framework is applied to a portfolio choice problem under uncertainty between different economic regimes. Concretely, using data on historical ETF returns, we compute robustness and contamination profiles for six portfolio strategies and analyze their behavior under heterogeneous belief specifications. The results illustrate that robustness-based selection refines classical expected utility by accounting for prior misspecification.

0

stat.ME 2026-05-12 2 theorems

Covariate-dependent level links low-fidelity quantiles to high-fidelity ones

Multi-Fidelity Quantile Regression

When distributions share shapes the level varies smoothly, reducing error with scarce high-fidelity samples.

abstract click to expand

High-fidelity (HF) data are often expensive to collect and therefore scarce, making conditional quantiles difficult to estimate accurately. We propose a two-stage, model-agnostic method for multi-fidelity quantile regression. The central idea is a local quantile link: at each covariate value, the HF quantile is represented as a low-fidelity (LF) quantile evaluated at a covariate-dependent level. This reformulation reduces the problem to estimating the level function, which can be smoother than the HF quantile itself when the LF and HF conditional distributions have similar shapes. We also study the complementary regime in which this advantage weakens and introduce a correction step to improve robustness. Our theory characterizes when the proposed estimator converges faster than direct quantile regression using HF data alone and when the correction step provides further improvement. Experiments on synthetic and real data show that our method yields more accurate quantile estimates and tighter conformal prediction intervals.

0

stat.ME 2026-05-12 2 theorems

Drop factor loadings below 0.70 in measurement models

Rethinking Factor Loading Thresholds: A Case for a Strict {λ} >= .70 Rule

Items under this threshold explain less variance than they introduce as error, according to AVE and communality logic.

abstract click to expand

This paper challenges the prevailing practice of accepting standardized factor loadings as low as .50 in confirmatory factor analysis. Drawing on the logic of Average Variance Extracted (AVE) and communality, the author argues for a stricter item level threshold: only indicators with loadings of {\lambda} >= .70 (implying {\lambda}sq >= .50) should be retained in final measurement models. The rationale is that indicators with {\lambda} < .70 contain more error than explained variance, undermining both construct validity and the stability of factor solutions. The paper reviews theoretical foundations, simulation evidence, and implications for structural equation modeling, showing that weak loadings degrade measurement quality, factor score determinacy, and model fit. Adopting a minimum {\lambda} >= .70 rule aligns item level standards with established construct level criteria and enhances the rigor and interpretability of latent variable models.

0

stat.ME 2026-05-12 Recognition

LDDMM distances enable Bayesian calibration of infinite-dimensional models

Diffeomorphic registration distances for Bayesian calibration of infinite-dimensional computer models

The distances measure minimal deformation energy and support a predictive posterior over output shapes.

abstract click to expand

The simulation of physical phenomena with computer models relies on the estimation of physical and/or numerical parameters calibrated to fit experimental data. The approximations within the computer model and the errors in the measurements lead to uncertainties in the calibrated parameters. Bayesian calibration offers a well-studied framework to provide reliable uncertainty quantification on the calibrated parameters. When dealing with complex computer codes whose outputs are infinite-dimensional, Bayesian calibration may be extended by providing a relevant distance in the output space. In this paper, Bayesian calibration is performed using distances from the large deformation diffeomorphic metric matching (LDDMM) framework. LDDMM distances can provide a suitable metric for infinite-dimensional shapes such as scalar fields (i.e. images) or function graphs. This metric can be interpreted as the minimal energy deformation required to transform one shape into another. As such, it provides a readily interpretable metric for Bayesian calibration. On top of this, the representation of the diffeomorphism group as an exponential transformation of an RKHS is compatible with Bayesian inference and allows to define a predictive posterior distribution on the infinite-dimensional space shape.

0

stat.ME 2026-05-12 2 theorems

New sample size formula for causal survival analysis uses few inputs

Sample size and power calculations for causal inference with time-to-event outcomes

It needs only treatment proportion, effect size and event rate for randomized trials, plus one overlap coefficient for observational data.

abstract click to expand

This paper develops power and sample size formulas for causal inference with time-to-event outcomes. The target estimand is the marginal hazard ratio: the coefficient of a marginal structural Cox proportional hazard model with treatment as the only predictor. We extend the robust sandwich variance theory and derive the analytical form of the asymptotic variance for the inverse probability weighted partial likelihood estimator. Building on this, we derive a new sample size formula valid at any prespecified effect size, applicable to both randomized trials and observational studies. For randomized trials, the formula requires only the canonical inputs of treatment proportion, effect size, and event rate. The new formula corrects the mischaracterization of classic log-rank-based formulas. For observational studies, one additional input suffices: an overlap coefficient summarizing covariate similarity between comparison groups. We further develop a variance inflation approach applicable to any propensity score balancing weights, anchored to the corrected baseline variance.

0

stat.ME 2026-05-12 2 theorems

Past selections guide future choices via monotonicity model

A Statistical Framework for Learning Preferences from the Past

A statistical estimator enforces that frequently chosen options remain favored, with guarantees and tests on real data.

abstract click to expand

In many real-world settings such as online recommendation or consumer choice modeling, individuals make repeated choices from a fixed set of options. Accurately estimating their underlying preferences is essential for generating personalized future recommendations. Probabilistic models for understanding user choice behavior from past decisions can serve as a valuable addition to existing recommender systems and choice prediction methods. To this end, in this article, we introduce a novel statistical framework for predicting user preferences based on their past choices, under a natural monotonicity assumption: options that were chosen more frequently or more intensely in the past are more likely to be chosen again in the future. Our approach builds on a parametric model proposed by Le Goff and Soulier (2017), originally used to describe how ants in an ant colony select a path among many pre-existing paths. We propose a non-parametric generalization of this model, drawing inspiration from the generalized elephant random walk introduced by Maulik et al. (2024). We develop a method of maximum likelihood estimation of the user preference probabilities under the above-mentioned monotonicity constraint. We also derive theoretical guarantees for our estimator and demonstrate the effectiveness of our method through both simulated experiments and real-world datasets.

0

stat.ME 2026-05-12 1 theorem

Domino framework controls k-bFDR under any dependence

Generalized Boundary FDR Control under Arbitrary Dependence: An Approach on Closure Principle

Guarantees error control for the k least significant discoveries with p-values or e-values and no independence assumption.

abstract click to expand

False discovery rate (FDR) is a cornerstone of modern multiple testing. However, it often fails to guarantee the reliability of "marginal" discoveries that lie at the boundary of the rejection set, which are often crucial in high-precision applications. While recent works (Soloff et al., 2024; Xiang et al., 2025) introduced the boundary false discovery rate (bFDR) to control the error probability at the marginal discovery, their method relies on restrictive assumptions such as independence or specific prior distributions. In this paper, we first propose $k$-bFDR, a novel generalization that controls the error probability of the $k$ least significant discoveries. We then provide a systematic investigation into the theoretical relationship between $k$-bFDR and existing error metrics. Furthermore, building upon the closure principle, we develop Domino, a unified framework that guarantees $k$-bFDR control under arbitrary dependence, applicable for both p-values and e-values. We prove the theoretical validity of the proposed Domino algorithm and demonstrate through extensive numerical experiments that it consistently achieves rigorous $k$-bFDR control while identifying trustworthy marginal discoveries. Analyses of real data reveal that $k$-bFDR control yields higher-quality rejection sets with greater practical significance.

0

stat.ME 2026-05-12 2 theorems

Proxy spectral structure identifies causal effects with hidden outcomes

Proximal Causal Inference for Hidden Outcomes

Influence-function estimators achieve multiple robustness and efficiency without needing unbiased measurements or any outcome observation.

abstract click to expand

Methods that rely on proxies, without imposing strong parametric structure, are increasingly used to deal with unobserved variables in causal inference. One influential line of this work reconstructs latent distributions used to identify the target functional by exploiting eigenvalue eigenvector structure. Within this framework, we first establish identification of the full data law in the presence of hidden outcomes, and then develop influence function based estimators for causal effects. To the best of our knowledge, this is the first work to develop influence function based estimators in this setting without relying on unbiased proxy measurements or partial observation, while achieving multiple robustness and desirable efficiency properties. We demonstrate the performance of our approach through simulation studies.

0

stat.ME 2026-05-11 2 theorems

Procedure finds subgroups with different treatment effects and exact FDR control

Adaptive discovery of effect modification in matched observational studies

It accounts for unmeasured confounding through sensitivity models and gains power from multiple matched controls in observational data.

abstract click to expand

Understanding effect modification -- how treatment effects vary across subpopulations -- is practically important in observational studies, as it helps identify which subgroups are likely to benefit from a given treatment. In this paper, we study the discovery of effect modification in matched observational studies, where each treated unit may be matched to multiple controls. We develop a finite-sample valid procedure for identifying and selecting covariate-interpretable subgroups, with exact control of the subgroup-level false discovery rate (FDR). Our method explicitly accounts for unmeasured confounding via sensitivity models, and leverages multiple matched controls to improve statistical power. We demonstrate the favorable performance of our method relative to baseline methods through extensive simulation studies and a real-world application to the economic returns to college education.

0

stat.ME 2026-05-11 2 theorems

Calibrating every LLM judge beats selecting only the accurate ones

Calibrate, Don't Curate: Label-Efficient Estimation from Noisy LLM Judges

Full panels after calibration cut error in half on RewardBench2 compared with accuracy-ranked top-5 subsets.

abstract click to expand

Multi-judge evaluation is increasingly used to assess LLMs and reward models, and the prevailing heuristic is to curate: keep the most accurate judges and discard weaker ones. We show that this heuristic can reverse when the target is not point accuracy, but calibrated probabilistic evaluation from a labeled calibration set. Holding the aggregation and calibration procedures fixed, we compare accuracy-ranked top-$k$ judge selection with using the full judge panel. Across four labeled pairwise-evaluation benchmarks spanning LLM-as-judge and reward-model settings, the calibrated full panel consistently outperforms accuracy-based selection. On RewardBench2, retaining all judges achieves negative log-likelihood (NLL) of $0.006$ versus $0.013$ under top-5 selection, halving the calibration error. This advantage persists after judge-family deduplication and against stronger same-pipeline subset search. We explain this reversal with oracle analyses showing that the optimal calibrated risk under proper scoring rules cannot increase when additional judge signals are made available, and that even below-chance judges can be useful when their biases are learnable and their signals are non-redundant. The resulting operating principle is simple: in multi-judge evaluation with labeled calibration data, do not discard weak judges by accuracy alone; keep them when they are parseable, non-redundant, and calibratable.

0

stat.ME 2026-05-11 2 theorems

Threshold m* decides when spatial effects matter in areal regression

On the Need for Spatial Random Effects in Bayesian Regression Models for Multilevel Areal Data

In multilevel areal data, this m* based on correlation, variance ratio and covariate alignment shows nonspatial models suffice above it, as

abstract click to expand

Although spatial models for areal data are widely used in multilevel settings, the conditions under which spatial and nonspatial random effects yield equivalent posterior inference for regression coefficients have never been formally characterized. We address this question within a hierarchical Bayesian framework for Gaussian outcomes, using the Leroux conditional autoregressive (CAR) prior distribution as a representative specification. We derive a closed-form sample size threshold, $m^*$, below which spatial modeling materially affects inference on regression coefficients and above which a simpler nonspatial model yields effectively equivalent results, and show that the absolute relative difference in posterior variances converges to zero at rate $O(m^{-1})$. The threshold depends on three interpretable quantities: the spatial correlation parameter, the ratio of between-area to within-area variance, and the alignment between the covariate and dominant spatial patterns in the data. Because each can often be estimated prior to model fitting, $m^*$ can serve as a practical study design tool. Simulation studies confirm that $m^*$ accurately identifies this threshold across a range of settings. However, when the covariate does not vary within a given location, spatial modeling remains necessary regardless of within-area sample size. These results offer formal guidance for practitioners deciding whether the added complexity of spatial modeling is warranted.

0

stat.ME 2026-05-11 Recognition

Bayesian model clusters marked point processes and estimates continuous intensities

Laplace Variational Inference for Dirichlet Process Mixtures of Marked Poisson Point Processes

Dirichlet process mixtures with constrained Laplace variational inference recover groups and mark-specific surfaces without gridding the map

abstract click to expand

Marked point process data arise when events occur in a space with event-level marks. We study clustering of replicated marked Poisson point processes and introduce Dirichlet process mixtures of marked Poisson point processes, a Bayesian nonparametric model that jointly infers latent cluster structure, the number of clusters, and continuous mark-specific intensity surfaces. We use a squared link intensity representation to obtain tractable continuous domain likelihood terms without gridding or thinning. For posterior inference, we develop an efficient variational Bayes algorithm with a constrained Laplace approximation for the nonconjugate basis-coefficient block. The resulting coefficient update is formulated as a constrained optimization problem, which avoids the sign ambiguity and nodal-line issue of squared-link models. We further establish theoretical guarantees for mode finding optimization. We demonstrate the performance of the proposed model and algorithm through synthetic experiments and real-data analysis.

0

stat.ME 2026-05-11 Recognition

Standard BH controls FDR curve across all parameters

Simultaneous false discovery rate control in location families

In location family data the usual procedure bounds false discoveries simultaneously for every insignificant location shift.

abstract click to expand

When testing a number of statistical hypotheses using data from location families, it is often useful to control the false discovery rate (FDR) not just for hypotheses of the null values but also of other parameter values that are deemed practically insignificant. Here we consider FDR as a curve indexed by the location parameter and suggest a simple generalization of the Benjamini-Hochberg procedure that controls the FDR curve below any user-specified level. As a corollary of our main result, we show that the standard Benjamini-Hochberg procedure -- designed to control the FDR at the null -- also provides simultaneous control of the whole FDR curve for free. We further demonstrate the implications of our results and some practical considerations with a numerical example.

0

stat.ME 2026-05-11 Recognition

Random forest surrogate cuts likelihood evaluations in phylogenetic SMC

Accelerating Bayesian Phylogenetic Inference via Delayed Acceptance Sequential Monte Carlo with Random Forest Surrogates

Predicting the effect of tree rearrangements lets the sampler reject bad moves early while recovering accurate posteriors.

abstract click to expand

In Bayesian phylogenetics, our goal is to estimate the posterior distribution over phylogenetic trees. Markov chain Monte Carlo methods are widely used to approximate the phylogenetic posterior distributions. For large-scale sequence data, repeated evaluation of the likelihood function incurs a high computational cost. In this article, we propose a machine-learning algorithm with over 35 topological and branch-length features to predict the changes in the likelihood function caused by tree moves (\eg,~eSPR, stNNI) used in standard MCMC approaches. This algorithm is then used to design a delayed acceptance MCMC kernel, which utilized the predicted surrogate function for preliminary rejection, to accelerate tree space searches. Furthermore, we integrate our proposed MCMC kernel into the sequential Monte Carlo sampler framework. We validate the proposed delayed-acceptance sequential Monte Carlo approach (DA-SMC) on simulation and real data sets. Our delayed acceptance kernel can maintain robust estimation while reduces the number of likelihood evaluations significantly, yielding substantial computational time savings. We develop a Python package that is available at https://github.com/wentYu/DAphyloSMC.

0

stat.ME 2026-05-11 Recognition

Bridge functions identify path-specific effects with hidden confounders

Proximal Path-Specific Inference

Four nonparametric strategies and a quadruply robust estimator achieve sqrt(n) consistency even when nuisance functions converge slowly.

abstract click to expand

Causal mediation analysis has been extended to estimate path-specific effects with multiple intermediate variables, isolating treatment effects through a mediator of interest while excluding pathways through its ancestors. Such analyses address bias from recanting witnesses, i.e., treatment-induced mediator-outcome confounders. However, existing methods typically rely on stringent assumptions precluding general unmeasured confounding, which are often violated in practice. In this paper, we relax these restrictions by leveraging observed covariates as proxy variables to accommodate unmeasured confounding among the treatment, recanting witness, mediator, and outcome. Using proximal confounding bridge functions, we develop four nonparametric identification strategies for the path-specific effect. We further derive the efficient influence function and propose a quadruply robust, locally efficient estimator. To handle high-dimensional nuisance parameters, we propose a proximal debiased machine learning approach. We theoretically guarantee that our estimator achieves $\sqrt{n}$-consistency and asymptotic normality even when machine learning estimators for nuisance functions converge at slower rates. Our approaches are validated via semiparametric and nonparametric simulations and an application to the CDC WONDER Natality study, estimating the path-specific effect of prenatal care on preterm birth through preeclampsia, independent of maternal smoking during pregnancy.

0

stat.ME 2026-05-11 2 theorems

Shared parametric value function scales RL measurement to large tasks

Reinforcement Learning Measurement Model

RLMM replaces per-person tables with one task-level Q-function, improving accuracy and speed as task complexity grows.

abstract click to expand

Interactive assessments generate sequential process data that are not well handled by conventional item response models. Existing MDP-based measurement approaches, such as the Markov decision process measurement model (MDP-MM, LaMar, 2018), link action choices to state-action values, but their reliance on person-specific tabular value functions makes them difficult to scale beyond small, fully enumerated tasks. We propose the Reinforcement Learning Measurement Model (RLMM), a measurement framework that decouples person-level choice sensitivity from task-level value representation through a shared parametric action-value function, making estimation more computationally efficient for larger process-data settings. The model combines a Boltzmann choice rule with normalized advantages, a soft Bellman consistency penalty, and a block-coordinate MAP procedure for joint estimation, while also yielding step-level influence diagnostics for identifying behaviorally critical decisions. In peg-solitaire simulations, the RLMM achieved higher estimation accuracy and substantially lower runtime than the original MDP-MM, with advantages increasing as task complexity grew. In AQUALAB gameplay logs, the estimated person parameter was positively associated with cumulative reward, task completion, and behavioral efficiency. These results show that the RLMM extends decision-process-based psychometric models to larger and more behaviorally realistic environments while preserving an interpretable latent trait tied to decision making steps.

0

stat.ME 2026-05-11 Recognition

New method bounds expected false effect-modifier selections

Causal Stability Selection

Causal stability selection merges cross-fitted estimates with path stability to deliver non-asymptotic false-positive control.

abstract click to expand

Identifying covariates that modify treatment effects is a central problem in causal inference. Yet existing data-adaptive procedures do not provide finite-sample control over the expected number of false discoveries, risking spurious findings that fail to replicate. We introduce causal stability selection, an algorithm that combines cross-fitted estimation of conditional average treatment effects with integrated path stability selection. The method accommodates arbitrary treatment effect estimators and arbitrary base selectors, and produces a selection set with an explicit, non-asymptotic bound on the expected number of false positives. Under standard causal identifying assumptions and regularity conditions on the base selector, we prove that the estimated selection probabilities converge to their oracle counterparts at the rate of the underlying treatment effect estimator. This establishes a direct connection between treatment effect estimation and effect modifier discovery. We illustrate the method on a randomized trial in oncology and on observational data on maternal smoking and infant birthweight.

0

stat.ME 2026-05-11 2 theorems

Nested sensitivity maps produce sharp bounds on transported quantile effects

Nested Sensitivity Envelopes for Transported Quantile Treatment Effects

A source odds-ratio bound and target likelihood-ratio bound combine into closed-form CDF envelopes that invert to attainable quantile and Q-

abstract click to expand

We study target-population quantile treatment effects when a source study may have unmeasured treatment confounding and may not transport to a target population after conditioning on observed covariates. The observed data consist of a source sample with treatment, outcome and covariates, and a target sample with covariates only. We impose two marginal sensitivity restrictions: an odds-ratio bound $\Gam$ for source treatment assignment and a conditional likelihood-ratio bound $\Lam$ for source-to-target potential-outcome distribution shift. For each treatment arm and threshold $y$, we derive a closed-form sharp target counterfactual CDF envelope. The envelope nests a source marginal-sensitivity map inside a target outcome-shift map, preserving two normalizations and generally improving on a single product likelihood-ratio relaxation. We prove process-level sharpness, so the envelopes are attainable as entire CDFs and can be inverted to obtain sharp target quantile bounds and sharp interval-hull QTE bounds. We then develop semiparametric theory for these nonsmooth bound processes. On regular index sets, we give the canonical gradient, including the source propensity contribution required in observational studies, and construct cross-fitted Neyman-orthogonal one-step estimators with uniform Gaussian approximation. On full index sets with active-set ties or mass points, we use Hadamard directional differentiability and subsampling-valid inference, with a primitive finite-support route for the required weak convergence. Finally, we invert simultaneous monotone CDF bands to obtain honest confidence sets for quantile and QTE interval-hull processes, and formulate the two-dimensional $(\Gam,\Lam)$ breakdown frontier as level-set inference for interval-hull non-refutation.

0

stat.ME 2026-05-11 2 theorems

Counterfactual CDFs are regular exactly when dual bridge is square-integrable

Regularity, Phase Transitions, and Uniform Inference for Proximal Counterfactual Quantile Processes

A Picard-type phase transition marks the boundary between root-n estimation and slower rates for proximal counterfactual quantiles and risk

abstract click to expand

This paper develops semiparametric theory for counterfactual distribution, quantile, and lower-tail risk processes under unmeasured confounding using proximal negative-control proxies. Rather than treating each threshold as a separate proximal mean problem with outcome $\mathbf 1\{Y\le y\}$, we study the continuum of inverse problems indexed by $y$. For each treatment arm $a$, the counterfactual CDF $F_a(y)=P\{Y(a)\le y\}$ is represented by the primal bridge equation $T_a h_{a,y}=g_{a,y}$ and the linear functional $\ell(h)=E\{h(W,X)\}$. The dual bridge $q_a$ solves $T_a^*q_a=1$, equivalently $E[\mathbf 1(A=a)q_a(Z,X)-1\mid W,X]=0$. We show that this dual equation, together with the minimal residual-moment condition required for the influence function to lie in $L_2(P_0)$, is the exact regularity boundary in a threshold-saturated observed-data proximal bridge model: $F_a(y)$ is pathwise differentiable if and only if a regular square-integrable dual bridge exists. The canonical gradient is \[ h_{a,y}(W,X)-F_a(y)+\mathbf 1(A=a)q_a(Z,X)\{\mathbf 1(Y\le y)-h_{a,y}(W,X)\}. \] A singular-system characterization gives a Picard-type phase transition: root-$n$ regular estimation is possible exactly when $\sum_j\ell_{a,j}^2/s_{a,j}^2<\infty$ and the residual moment is finite. Outside this region, finite-dimensional efficiency bounds diverge under residual-noise nondegeneracy, and Gaussian inverse benchmarks yield slower minimax rates. We further establish efficient CDF-process inference, cross-fitted uniform doubly robust expansions, finite-rank weak-proxy rate conditions, density-free simultaneous quantile bands by inversion of CDF bands, and lower-tail CVaR inference via a shortfall representation. The estimators rely on closed-form linear algebra, convex Tikhonov regularization, and isotonic projection for shape enforcement.

0

stat.ME 2026-05-11 2 theorems

Unsigned CATE estimates power randomization tests without splitting data

Fit CATE Once: Model-Assisted Randomization Tests Without Sample Splitting

Researchers fit the magnitude of effect heterogeneity once from residual covariances, then run exact tests on the original assignments.

abstract click to expand

Randomization tests and flexible treatment-effect models offer complementary strengths for analyzing data from randomized panel experiments: the former provide valid inference under the known assignment mechanism, while the latter can capture complex patterns of effect heterogeneity. We develop model-assisted randomization tests that combine these strengths without sample splitting. The key idea is to estimate an unsigned version of the conditional average treatment effect (CATE) from the covariance structure of residualized outcomes, while leaving the realized assignments for randomization inference. The remaining sign can be chosen to best fit the observed outcomes. We establish identification and consistency for the proposed unsigned CATE estimators, as well as validity for the CATE-assisted randomization tests. Across synthetic and semi-synthetic experiments, the CATE-assisted randomization tests control Type I error and achieve higher power than covariate-adjusted and sample-split alternatives. Finally, we show that the assignment-free CATE estimates can be used to discover heterogeneous subgroups and test subgroup-specific treatment effects.

0

stat.ME 2026-05-11 2 theorems

Clustering method consistent for high-dimensional heavy-tailed data

Semiparametric Elliptical Mixture Clustering for High-Dimensional Data

Semiparametric elliptical mixtures share an unknown radial generator and sparse precision matrix to keep estimates and misclustering error 0

abstract click to expand

Clustering high-dimensional data is especially challenging when cluster distributions are heavy tailed and only approximately elliptical. Existing high-dimensional methods are largely built for Gaussian or other light-tailed models, whereas classical robust elliptical procedures are mostly low dimensional or rely on fully parametric radial families. We propose a semiparametric elliptical mixture clustering framework with cluster-specific centers, an unknown common radial generator, and a common sparse precision-shape matrix, together with a data-driven rule for selecting the number of clusters. A generalized expectation-maximization (GEM) algorithm is developed by combining transformed-radius estimation of the radial generator, radial-score center updates, and a Tyler-POET-GLASSO update for the common precision-shape matrix. The method avoids specifying a parametric radial family and remains computationally feasible in high dimensions. We establish high-dimensional consistency for the estimated model components and the excess misclustering error. Simulation studies and a handwritten-digit application demonstrate the competitive performance and robustness of the proposed method, particularly in heavy-tailed elliptical settings.

0

stat.ME 2026-05-11 Recognition

Model averaging unites linear regression and ML for optimal predictions

Prediction-Powered Linear Regression: A Balance Between Interpretation and Prediction

PUMA achieves asymptotic optimality in and out of sample while preserving coefficient interpretability.

abstract click to expand

Unlabeled data are increasingly prevalent in contemporary economic studies, yet their effective use for improving prediction remains challenging because the outcomes are often costly or even infeasible to observe. Machine learning methods can help label these data and achieve high predictive accuracy, but they often lack interpretability. In this paper, we propose a Prediction-powered Unified Model Averaging (PUMA) framework to combine linear regression and machine learning methods, achieving a balance between interpretation and prediction. Unlike existing works on prediction powered inference, our approach is the first to jointly address uncertainty arising from model misspecification, power-tuning selection, and the choice of machine learning algorithms by using model averaging. Theoretically, we establish the asymptotic prediction optimality of the proposed method both in-sample and out-of-sample under mild conditions, along with estimation consistency. Extensive simulations and a real-world application further demonstrate the empirical advantages of the proposed method.

0

stat.ME 2026-05-11 2 theorems

Simulation corrects bias in semiparametric models

Bias Correction for Semiparametric Regression Models

SABRE reduces finite-sample bias for diverging-dimensional coefficients and dispersion without raising variance in generalized partially线性线性

abstract click to expand

We consider a broad class of semiparametric regression models in which the conditional distribution of the response takes the form $f\{Y|\bf{x}^{\rm T}\boldsymbol{\beta}+m(z), \phi\}$, which is known up to a parametric component $\boldsymbol{\beta}$ of diverging dimension $p$, a smooth function $m(\cdot)$, and a dispersion parameter $\phi$. Existing semiparametric literature on such models has primarily focused on semiparametric efficiency for $\boldsymbol{\beta}$, typically treating $\phi$ and $m(\cdot)$ as nuisances and largely ignoring their finite-sample bias. However, the finite-sample bias of standard estimators can be substantial (especially when $p$ is large relatively to $n$ and/or dispersion is high) and can seriously undermine inference for $\boldsymbol{\beta}$. Moreover, $\phi$ is often of direct scientific interest and requires accurate estimation. To address this gap, we propose SABRE, a simulation-based bias correction framework for this broad semiparametric model class. We establish asymptotic properties of SABRE for the subclass of generalized partially linear models, where bias reduction for $\boldsymbol{\beta}$ and $\phi$ can be achieved without inflating variance, and we outline how the underlying principle may be adapted more generally. Comprehensive simulation studies and a real-data application on early-stage diabetes demonstrate the empirical effectiveness of SABRE in reducing bias and improving inference.

0

stat.ME 2026-05-11 2 theorems

Spherical mixtures unify EHR code embeddings across institutions

Spherical Mixture Integration for Latent Embedding Alignment across Multi-Source Feature Spaces

The method recovers synonym clusters consistently and quantifies statistical gains from integrating multiple sources plus auxiliary graph信息.

abstract click to expand

Multi-institutional electronic health record (Multi-EHR) data have emerged as a powerful resource for developing predictive models to support clinical decisions and for generating reliable real-world evidence. By aggregating information from diverse patient populations and institutions, they enhance the robustness and generalizability of models and findings. However, analyzing multi-EHR remains challenging because disparate institutions rarely map all data elements to common ontology, and raw EHR codes are often overly granular and institution-specific, fragmenting representations of the same clinical concept. Hence, integrative analysis must overcome two key hurdles: harmonizing codes with the same clinical meaning (synonymy), and aligning institutional feature spaces. To address these challenges, we propose SMILE, a Spherical Mixture Integration for Latent Embedding alignment across multi-source feature spaces, where embeddings from heterogeneous sources serve as privacy-preserving summaries of clinical concepts and sparse auxiliary relationship pairs provide weak supervision on the latent geometry. Synonymy is modeled via a mixture of von Mises-Fisher distributions, yielding unified representations that consolidate semantically equivalent raw codes. We develop a composite quasi-likelihood estimation procedure and establish non-asymptotic error bounds for latent representations and mixture mean directions, together with consistent recovery of synonym clusters. The theory quantifies statistical gains from integrating multiple sources and auxiliary knowledge graph information. Simulations and a multi-institutional EHR application demonstrate improved alignment and synonym clustering.

0

stat.ME 2026-05-11 Recognition

Time-weighted estimator recovers activity anchors from irregular GPS

An Object-Oriented Spatial Statistics Approach for Human Activity Space Estimation

Distributing time over polygons and roads bounds errors from sampling gaps and variability while identifying stable locations and corridors.

abstract click to expand

Human activity spaces are shaped by individual mobility and the built environment, motivating statistical methods that integrate GPS observations with GIS representations of places and routes. We propose a novel methodology to estimate activity spaces in built environments from GPS data within the Object Oriented Spatial Statistics framework. We characterize daily mobility through the distribution of time across spatial polygons and road segments, aiming to capture entity-specific time-use fractions and level-$\gamma$ activity spaces. We develop a time-weighted estimator to handle irregularly sampled GPS observations. We derive an error bound that quantifies the effects of measurement error, nearest-entity misclassification, temporal gaps, boundary crossings, and day-to-day variability. We also develop a map-augmented representation of daily activity patterns, a dwell-time-weighted distance for clustering daily trajectories, and polygon- and road-based stability summaries. Simulation studies and a real-data application demonstrate that the proposed framework recovers concentrated stationary anchors, interpretable travel corridors, and distinct stabilization behavior for dwelling and movement components, supporting the benefits of weighting under irregular sampling. KEYWORDS: GPS data, GIS, human mobility, space-time geography.

0

stat.ME 2026-05-11 2 theorems

Rolling calibration window optimizes conformal coverage for time series

Rolling-Origin Conformal Prediction under Local Stationarity and Weak Dependence

Under local stationarity, the m most recent errors yield minimax rate O(T^{-β/(2β+1)}) and beat full-history methods on real data.

abstract click to expand

We propose and analyse rolling-origin conformal prediction for time-series forecasting. The method calibrates the conformal quantile against the $m$ most recent pseudo-out-of-sample forecast errors, adapting to serial dependence, volatility clustering, and distributional drift that invalidate classical conformal guarantees. Under H\"{o}lder-$\beta$ local stationarity and $\alpha$-mixing, we establish a four-term coverage-error decomposition and derive the optimal calibration window $m^{\star} \asymp T^{2\beta/(2\beta+1)}$ with coverage-error rate $O(T^{-\beta/(2\beta+1)})$. A Le Cam two-point construction shows this rate is minimax-optimal over the H\"{o}lder-$\beta$ model class. The Bahadur representation is proved under both $\alpha$-mixing and the physical-dependence framework of Wu (2005). An oracle inequality formalises Winkler cross-validation as an adaptive window selector; the required uniform concentration condition is established in an appendix. Validation on six real series and 93 M4 competition series confirms the theory: rolling-origin calibration outperforms full-history calibration in 86\% of comparisons (median Winkler improvement 12.3\%), maintains coverage within $\pm2\%$ of the 90\% target at short and medium horizons, and the cross-frequency log-log regression slope $0.614$ ($95\%$ CI $[0.424, 0.805]$) is consistent with the theoretical $2/3$ after controlling for frequency fixed effects.

0

stat.ME 2026-05-11 1 theorem

Adjusting for assessment timing removes bias in EHR trial estimates

Statistical Design of Pragmatic Trials Using Electronic Health Record Data when Outcome Assessments are Uncontrolled and Irregular

Simulations show that when treatment changes how often outcomes are recorded, simple single-score methods distort results while flexible纵向纵向

abstract click to expand

Pragmatic trials increasingly define outcomes using real-world data such as electronic health records, where assessments are collected during routine care rather than at fixed timepoints. Consequently, these uncontrolled assessments may be irregular, sparse, and affected by the intervention (intervention-dependent assessments), which can lead to biased treatment effect estimates. We developed a simulation study to inform the statistical approach for trials with uncontrolled assessments, which we applied to the MI-CARE pragmatic trial. Using a pre-trial cohort mimicking eligibility and outcome measurement, we estimated assessment frequency and timing and combined these estimates with assumptions about how the intervention effects might impact assessment. We simulated sparse and intervention-dependent assessments and compared single-measure approaches with longitudinal models using all scores. Under intervention-dependent assessments, we found that naive methods such as using the best score or using a randomly selected score without adjusting for measurement timing produced substantial bias. Models that adjusted flexibly for the follow-up timing estimated time-point specific or time-averaged treatment effects without bias. Simulation results informed the selection of the statistical approach for the MI-CARE trial. Among unbiased methods, the most powerful was a linear mixed model with exponential correlation structure, adjustment for time since baseline, and a time-varying intervention effect to estimate the intervention effect at the end of the intervention window. Future studies can use pre-trial data to conduct a simulation study tailored to the trial's data features to inform the analytic approach. Trials with uncontrolled assessments should consider the potential for intervention-dependent assessments and select an appropriate method to avoid bias.

0

stat.ME 2026-05-11 Recognition

Rebiasing debiased estimates shortens intervals with valid coverage

Empirical Bayes Rebiasing

Learning the distribution of biases from noisy observations trades some bias for reduced variance while preserving asymptotic coverage.

abstract click to expand

We study methods for simultaneous analysis of many noisy and biased estimates, each paired with an even noisier estimate of its own bias. The analyst's goal is to construct short calibrated intervals for each parameter. The standard debiasing approach, which subtracts the bias estimate from each biased estimate, inflates variance and yields long intervals. In this paper, we propose an empirical Bayes rebiasing strategy that starts from the fully debiased estimates and learns from data how much bias to reintroduce by estimating the unknown bias distribution. We provide convergence rates for the coverage of our intervals when the bias distribution is estimated using nonparametric maximum likelihood. Furthermore, we demonstrate substantial precision gains in prediction-powered inference, including pairwise LLM win-rate evaluations, as well as for inference of direct genetic effects in family-based GWAS.

0

stat.ME 2026-05-11 Recognition

SSL method raises efficiency in doubly censored EHR risk prediction

Semi-supervised Method for Risk Prediction with Doubly Censored EHR Data

Combining scarce gold-standard labels with abundant surrogate data yields more precise clinical risk estimates without added bias.

abstract click to expand

The rapid expansion of large-scale electronic health record (EHR) data offers unique opportunities to improve the accuracy and efficiency of clinical risk estimation. Yet, because clinical events may occur outside the recording health system, clinical event outcomes are frequently subject to double censoring (both left and right). Besides, gold-standard event times can often only be ascertained through labor-intensive manual chart reviews, yielding labels for only a small subset of patients. Reliance on this limited labeled set alone is limited in efficiency, whereas widely available surrogate outcomes such as the time to first diagnostic code or first disease mention are error-prone and can yield biased estimates if used directly. Semi-supervised learning (SSL) methods provide a principled way to integrate labeled and unlabeled data, and prior work has demonstrated their advantages in settings with binary or right-censored outcomes. However, existing approaches do not accommodate double censoring for risk prediction, which poses additional methodological challenges. To address this gap, we develop a novel SSL framework for risk prediction that combines a small set of gold-standard labels with large-scale surrogate information under double censoring. We establish the theoretical validity of the proposed estimator. Through extensive simulation studies, we show that our method substantially improves estimation efficiency relative to existing supervised estimators (based on the labeled data). Finally, we demonstrate its practical value by applying it to study risk factors for type 2 diabetes (T2D) using EHR data from a health system in the US.

0

stat.ME 2026-05-11 Recognition

Combined rank tests detect wider benefits in experiments

Randomization Tests for Distributions of Individual Treatment Effects via Combined Rank Statistics

Adaptive procedures maintain exact validity under randomization while matching or exceeding the power of the best single statistic for the分布

abstract click to expand

What proportion of treated units actually benefited from an experimental intervention? What is the median or the largest individual treatment effect? This paper develops methods for answering such questions about the distribution of individual causal effects in randomized experiments. Existing approaches require the analyst to select a rank-based test statistic before observing the data. A poor choice can substantially reduce power, while searching over multiple test statistics and adjusting for multiplicity using Bonferroni correction also incurs power loss. We propose inference procedures that adaptively combine multiple rank-based statistics while maintaining finite-sample validity. For stratified experiments, we further develop weighting schemes that effectively aggregate evidence across strata of heterogeneous sizes. The resulting combined test achieves power comparable to, or exceeding, that of the best individual test, without requiring prior knowledge of the optimal statistic. When applied to a randomized experiment evaluating a teacher training program, the combined test suggests that roughly half of treated teachers benefited, whereas a single rank-based test may indicate only a small minority. Thus, the choice of test determined whether the program appears broadly successful or narrowly effective.

0

stat.ME 2026-05-11 Recognition

Bayesian imputation improves coverage for missing functional data

BAMIFun: Bayesian Multiple Imputation for Functional Data

Multiple draws from a low-rank penalized-spline posterior outperform single-imputation FPCA methods in simulations and real datasets with ir

abstract click to expand

Missing data are pervasive in modern functional datasets, where trajectories are often sparsely or irregularly observed. Although Functional Principal Component Analysis (FPCA) is widely used to reconstruct incomplete curves, existing FPCA-based approaches typically employ single imputation, leading to overly optimistic inferences in downstream analyses. To address these challenges, we develop a novel Bayesian multiple imputation framework for functional data (BAMIFun). For single-level functional data, we impose a Bayesian low-rank model that incorporates penalized spline representations to enforce smoothness of eigenfunctions and derive an efficient Gibbs sampler algorithm for posterior computation. In addition, we demonstrate and validate how to properly account for the estimation uncertainties in downstream analysis. Furthermore, we extend the framework to multiway functional data using a low-rank Functional Tensor Singular Value Decomposition (FTSVD) model, enabling Bayesian multiple imputation in settings not supported by existing methods. Simulation studies show that, compared to existing methods, BAMIFun achieves accurate imputation while providing substantially improved coverage and more reliable downstream inference. Case studies using a physical activity dataset and an infant gut microbiome dataset further demonstrate the practical advantages of our proposed methods under severe missingness. Code for our algorithms is available at https://github.com/ZirenJiang/BAMIFun.

0

stat.ME 2026-05-11 2 theorems

Estimator resists both row-wide and single-cell outliers in regression

Cellwise and Casewise Robust Multivariate Regression with Inference

CellMR plus cellBoot gives stable coefficients and asymptotically valid intervals even with missing entries and mixed contamination types.

abstract click to expand

Multivariate linear regression is a fundamental statistical task, but classical estimators such as ordinary least squares are highly sensitive to outliers. These may occur as casewise outliers that affect entire observations, or as outlying cells, that are individual contaminated entries in the predictor and/or response matrix. Moreover, modern datasets frequently contain missing values and are high-dimensional. To address these challenges we propose the cellwise multivariate regression (cellMR) estimator, a robust regression method that simultaneously accommodates casewise and cellwise outliers, missing data, and high dimensionality. The approach builds on a cellwise robust covariance estimator and uses ridge regularization for numerical stability. We further introduce cellBoot, a novel bootstrap-based inference procedure tailored to the cellMR framework. Relying on indirect inference, cellBoot provides asymptotically valid confidence intervals that are robust to casewise and cellwise contamination. We derive influence functions of the regression estimator and prove the asymptotic validity of the cellBoot confidence intervals. Simulations and a real genomics application illustrate the strong finite-sample performance of the proposed methods.

0

stat.ME 2026-05-11 Recognition

CHASM spots dependence shifts via recursive DMD eigenvalues

CHASM: Online Changepoint Detection in Temporal and Cross-Variable Dependence

Unsupervised online detector tracks truncated eigenvalue sequences in multivariate series without strong distributional assumptions.

abstract click to expand

Changepoint detection identifies times when the generative process of a time series changes, with applications in healthcare, cybersecurity, and finance. In multivariate settings, changes in cross-variable and temporal dependence are particularly challenging to detect, as they are often less pronounced than shifts in marginal statistics such as the mean or variance. Existing methods detect changes using reconstruction error, which provides only an indirect measure of dynamical change, or rely on scalar functionals that may be too coarse to capture global structure. We introduce CHASM, an online nonparametric method that monitors the truncated eigenvalue sequence of the recursively estimated dynamic mode decomposition operator. Designing such an approach raises two challenges: the permutation invariance of eigendecompositions, resolved via optimal linear assignment, and the lack of online changepoint methods for multivariate complex-valued time series, addressed through a novel augmented monitoring scheme. We study the theoretical properties of the dynamics estimator under the canonical vector autoregressive model, which directly motivates our algorithmic design. The proposed method achieves competitive or superior performance to modern competitors across synthetic and real-world data sets, including challenging settings in video and text data. It is unsupervised, depends on a small number of interpretable parameters, and requires no distributional assumptions beyond finite moments, making it readily deployable across scientific domains.

0

stat.ME 2026-05-11 2 theorems

GenAI extracts deconfounders for causal effects of sequences in text and video

GenAI Powered Dynamic Causal Inference with Unstructured Data

The framework learns a separate deconfounder for each position in a treatment sequence and yields valid confidence intervals, revealing that

abstract click to expand

A growing number of scholars seek to estimate causal effects of unstructured data such as text, images, and video. However, existing methods typically treat each object as a single, static observation. We develop a statistical framework for dynamic causal inference with unstructured data by leveraging generative artificial intelligence (GenAI) models. Our approach enables researchers to estimate the causal effects of sequences of treatment features, including their positions within text and video. We first extract internal representations of unstructured objects from a GenAI model and then estimate a marginal structural model using a neural network architecture that jointly learns a deconfounder for each treatment feature in the sequence. Our semiparametric inference framework yields valid asymptotic confidence intervals. Simulation studies demonstrate that the proposed estimator recovers the target causal effects and that the confidence intervals achieve nominal coverage in finite samples. We further apply our method to a randomized experiment on the Hong Kong protests, showing that the effect of a treatment feature depends critically on its position within the text.

0

stat.ME 2026-05-11 2 theorems

Optimal cutoffs minimize weighted risk in skew-normal biomarker models

Parametric ROC Analysis and Optimal Cutoff Selection under Scale Mixtures of Skew-Normal Distributions: A Decision-Theoretic Framework with Asymptotic Inference

A decision-theoretic framework under scale mixtures of skew-normal distributions provides unique thresholds with asymptotic normality and a

abstract click to expand

We study an optimal threshold functional arising in binary classification for continuous biomarkers. While the ROC curve summarizes discriminatory performance across all thresholds, practical threshold selection must also account for disease prevalence and asymmetric misclassification costs. The classical Youden index corresponds to a symmetric special case and may therefore be suboptimal in realistic decision settings. In addition, biomarker distributions in serological and immunological studies often display skewness and heavy tails, making Gaussian ROC models inadequate. We develop a parametric framework for ROC analysis and optimal cutoff selection under the family of scale mixtures of skew-normal (SMSN) distributions, including the skew-normal and skew-t models. The ROC curve and AUC are estimated by plug-in maximum likelihood from separate group fits. The optimal cutoff is defined as the minimiser of a weighted misclassification risk, which yields a likelihood ratio equation extending the Youden criterion. Under a monotone likelihood ratio condition, we establish existence, uniqueness, and global optimality of the cutoff. We further study its local regularity as an implicitly defined functional of the model parameter and derive consistency, asymptotic normality, and a closed-form plug-in variance estimator. A central term in this variance is the local slope of the estimating equation at the optimal threshold, which acts as a local identifiability diagnostic. Monte Carlo experiments across six scenarios show that the asymptotic approximation is accurate and that Wald confidence intervals attain near nominal coverage. An application to SARS-CoV-2 serological data illustrates that the proposed cutoff can differ substantially from the Youden threshold and may reduce estimated misclassification risk by up to 63% under asymmetric decision settings.

0

stat.ME 2026-05-11 2 theorems

Optimizing AP tests nears theoretical power limits in adaptive trials

Operationalizing Allocation Probability Tests: Practical Guidance on Optimized Implementation for Power and Robustness

Refining the test statistic and null selection delivers higher power while keeping error rates and ethical allocation goals intact.

abstract click to expand

Recently, a new testing approach for response-adaptive clinical trials was proposed based on the allocation probabilities (AP) rather than the outcome data. While original work on the AP test focused on binary and normal endpoints and demonstrated that significant efficiency gains are possible, many critical questions remain open regarding its practical implementation and upper limits. In this work, rather than simply proposing novel statistics, we seek to understand the maximum gain that can be obtained with the AP test by optimizing how these probabilities are used to define the test statistic. We expand the method's practical utility by applying it to survival endpoints (exponential distributions) and introducing a rigorous strategy for selecting the null hypothesis to properly calibrate type I error. Our simulation studies reveal that by optimizing the functional form of the AP test, investigators can achieve a substantial increase in power, approaching the theoretical maximum, without sacrificing the patient outcome goals of the design. Furthermore, we explicitly compare the method to a standard Bayesian decision rule, finding that the optimized AP test significantly outperforms traditional frequentist tests while maintaining strict error control. This work provides a missing practical framework for implementing robust and optimized AP tests in complex response-adaptive settings.

0

stat.ME 2026-05-11 Recognition

Robust Tensor Regression with Nonconvexity: Algorithmic and Statistical Theory

It supplies a globally convergent algorithm and general rates plus error bounds for stationary points across linear, GLM, Huber, and some非凸

abstract click to expand

Tensor regression is an important tool for tensor data analysis, but existing works have not considered the impact of outliers, making them potentially sensitive to such data points. This paper proposes a low tubal rank robust regression method for analyzing high-dimensional tensor data with heavy-tailed random noise. The proposed method is based on a nonconvex relaxation of the tensor tubal rank within a general optimization framework, which allows for nonconvexity in both the loss and penalty functions. We develop an implementable estimation algorithm and establish its global convergence under some mild assumptions. Furthermore, we provide general statistical theories regarding stationary point, including the rates of convergence and bounds on the prediction error. These theoretical results cover many important models, such as linear models, generalized linear models, and Huber regression, and even encompass some nonconvex losses like correntropy and minimum distance criterion-induced losses. Supportive numerical evidence is provided through simulations and application studies.

0

stat.ME 2026-05-11 2 theorems

Inverse mean or variance independence recovers central subspace

Sufficient Dimension Reduction via Inverse Conditional Mean or Variance Independence

Four new estimators based on projections and kernels generalize existing SDR methods and converge in high dimensions.

abstract click to expand

This paper presents a unified framework for sufficient dimension reduction (SDR) that generalizes several existing SDR techniques and offers new insights into the connection between inverse conditional moment independence and dimension reduction. The framework is built on two forms of inverse independence between the response vector and predictors: inverse conditional mean independence (ICMI) and inverse conditional variance independence (ICVI). For each form, we develop two general classes of matrices capable of recovering the central subspace, based on projection and kernel techniques respectively. This yields four distinct estimators: projection- and kernel-based variants under both ICMI and ICVI frameworks. Under standard regularity conditions, we establish the theoretical properties of these estimators and derive their convergence rates in high-dimensional settings. The proposed methods exhibit robustness to outliers in the response variable while maintaining computational competitiveness. Simulation studies and real-data analyses demonstrate the practical effectiveness of the proposed methods.

0

stat.ME 2026-05-11 Recognition

Missing data requires larger samples for stable prediction models

Incorporating Missing Data Considerations into Sample Size Calculations for Developing Clinical Prediction Models

Adapting sample size calculations to include missing predictors prevents performance degradation in clinical models.

abstract click to expand

Clinical prediction models must be developed using sufficiently large datasets to minimise overfitting and ensure robust predictive performance. Existing sample size calculations assume complete predictor data for all included participants, yet missing values are common and may increase required sample sizes. This study aimed to quantify how missing predictor data and different imputation methods affect overfitting and model degradation, within datasets that adhere to current sample size criteria. We also aimed to explore how a general sample size framework based on anticipated posterior (sampling) distributions can be adapted to incorporate missing data assumptions and handling strategies. Using a simulation study, we found that in development data meeting current minimum sample size requirements, missing data reduced predictive performance, with expected calibration slopes frequently falling below the targeted value of 0.9. Increasing the required sample size to account for missing data reduced overfitting concerns, but the necessary inflation factor was context specific. In some scenarios, up to twice the minimum sample size was needed to achieve performance comparable to models developed using fully observed data. Expected value of perfect information calculations allowed quantification of the expected loss due to finite samples and missingness. Through two applied examples, we illustrate how embedding missing data assumptions and handling within the posterior sampling approach provides a principled way to determine required minimum sample sizes under missing data. Overall, missing predictor data increases minimum sample size requirements to develop stable and well-calibrated models. Our adaptations to recent posterior (sampling) sample size calculations offer a practical approach for incorporating missing data directly into sample size calculations.

0

stat.ME 2026-05-11 2 theorems

Hidden Markov model detects regime shifts in proportion time series

A Beta-GAM Hidden Markov Model for Proportion Time Series

Beta emissions with state-specific GAM means and precisions recover two persistent regimes in Russian age-specific female mortality ratios.

abstract click to expand

We propose a hidden Markov model for univariate proportion time series taking values in (0,1), where regime switching captures latent structural changes and the emission distribution belongs to the Beta family. In each latent state, the Beta mean is linked to covariates through a generalized additive model (GAM) with spline-based smooth functions, while the Beta precision is state-specific, enabling flexible modeling of both nonlinear covariate effects and regime-dependent variability. Estimation is carried out via a penalized expectation--maximization algorithm, combining smoothing with numerical maximization of the penalized emission likelihood. To select the number of latent states and the smoothing penalty, we implement a grid search guided by standard information criteria (Akaike Information Criterion/Bayesian Information Criterion/Integrated Completed Likelihood) with a diagnostic filter that removes degenerate solutions characterized by explosive precision estimates. Uncertainty is quantified through a parametric bootstrap procedure for transition probabilities and state-dependent parameters. Simulation results demonstrate accurate recovery of transition dynamics, state precisions, and latent-state decoding. A motivating application to Russian age-specific mortality data (1960--2014, ages 0--40) illustrates how the proposed model summarizes smooth age patterns in female-to-total mortality ratios while identifying two persistent latent regimes that admit a substantive demographic interpretation in light of the country's well-documented mortality shocks that occurred over the second half of the twentieth century.

0

stat.ME 2026-05-11 Recognition

Calibrating observational data to experiments transports treatment effects consistently

Transporting treatment effects by calibrating large-scale observational outcomes

The estimator stays consistent for the transported average treatment effect and supports valid inference without requiring overlap between a

abstract click to expand

A high-quality experimental dataset is typically much smaller than a corresponding observational dataset. In this regime, we propose an estimation and inference method for a transported treatment effect when there are imperfect and possibly biased measurements of the outcome of interest in the observational dataset. First, we estimate the conditional average treatment effect (CATE) by using ordinary least squares to calibrate a treatment-control contrast in the observational outcome to the experimental data. Then, our estimator is the sample average of this estimated CATE over the observational dataset. Unlike existing methods, our inference remains asymptotically valid without positivity (overlap) between the two datasets. When the calibration regression is well specified, our estimator is consistent for the transported average treatment effect. Otherwise, it converges to a projection estimand. As long as the observational dataset size grows sufficiently quickly relative to the experimental dataset size, our estimator achieves a notion of semiparametric efficiency proposed in recent work on semi-supervised learning for the projection estimand. We illustrate the precision and stability of our methodology compared to existing proposals for transporting average treatment effects under various degrees of positivity violations using numerical simulations and a data example that incorporates field experiments and satellite images to estimate an aggregate effect of crop rotation on maize (corn) yields over a large area of the Midwestern United States.

0

stat.ME 2026-05-11 2 theorems

Finite-horizon model spots seasonal flea market patterns

A Finite-Horizon Mixture Cure Model with Application to Online Flea Market Data

By capping the cure definition at a chosen period, the model avoids infinite-tail assumptions and reveals time-sensitive user behaviors in e

abstract click to expand

This study proposes a mixture cure model that latently divides a population based on event occurrence within a finite time horizon. Conventional models rely on event occurrence over an infinite horizon, introducing untestable assumptions that often lead to issues with identifiability and interpretability. By shifting the estimand to a specific period of interest, the proposed approach reduces reliance on these infinite-tail assumptions and aligns interpretations more closely with finite-horizon decision-making objectives. Through simulation studies, we first evaluate the statistical properties of the proposed estimator, including estimation bias and variance. We further show that relying on conventional infinite-horizon models for finite-horizon decision-making can lead to erroneous judgments. Finally, we apply the model to transaction data from Mercari, a Japanese online flea market platform. The empirical results reveal that the proposed model identifies different significant variables compared to the conventional model, offering interpretations that better reflect seasonal variation in user behavior.

0

stat.ME 2026-05-08 1 theorem

Multi-stage smoothing recovers evolving network edges

Nonparametric estimation of time-varying network connections by multi-stage smoothing

Temporal per-edge smoothing plus data-driven node smoothing estimates changing connection probabilities accurately.

abstract click to expand

We consider the problem of estimating the underlying edge probabilities of a time-varying network observed at multiple time points. The probability structure is represented by a time-varying graphon that satisfies temporal H\"older smoothness and piecewise Lipschitz conditions in the latent variables. We propose a multi-stage smoothing estimator that first applies temporal local smoothing to each edge and then performs node-domain smoothing using a data-driven neighborhood construction adapted from the method. An additional temporal smoothing step is introduced as an optional refinement when uniform accuracy over the entire time domain is required. Simulation studies demonstrate the benefits of combining temporal and node-domain smoothing under different generative models. We also apply the method to a real time-varying network dataset and show that it captures both smooth temporal evolution and structural patterns in the connectivity.

0

stat.ME 2026-05-08 1 theorem

Bayesian model contracts to true dynamic correlations at explicit rate

Modeling Dynamic Correlation Matrices with Shrinkage Priors

Low-rank factors and dynamic shrinkage priors deliver adaptive regularization plus a scalar dependence measure for portfolio monitoring.

abstract click to expand

Estimating time-varying correlation matrices is challenging because existing methods may adapt slowly to structural changes, impose insufficient regularization, or produce diffuse posterior uncertainty. In moderate dimensions, an additional difficulty is summarizing the estimated evolving dependence structure for downstream decision-making tasks. We propose a Bayesian approach based on a low-rank factor representation, with latent states evolving under a dynamic shrinkage prior and observation errors following a multivariate factor stochastic volatility model. This specification allows locally adaptive regularization of the estimated correlation structure over time and informative uncertainty quantification. We establish, to our knowledge, a first-of-its-kind posterior contraction result for dynamically regularized Bayesian models, showing contraction around the true model parameters at an explicit rate under averaged Hellinger distance. To summarize the estimated correlation matrices, we build on the information-theoretic concept of total correlation to obtain a scalar measure of cross-sectional dependence. Simulation studies show improved accuracy and responsiveness relative to competing methods in a range of challenging scenarios. We then apply our method to monitoring the correlation evolution of equity portfolios during periods of financial market stress, providing an ex post framework for assessing the changing benefits of diversification in backtesting analyses.

0

stat.ME 2026-05-08

Variance estimator fixes type I error for rare binary trial outcomes

Improving Variance Estimation for Covariate Adjustment with Binary Outcomes

IF-LOO method keeps valid inference in small samples and near probability boundaries where standard approaches fail.

abstract click to expand

Covariate adjustment is a general method for improving precision when estimating treatment effects in randomized trials and is recommended by the FDA in its 2023 guidance when baseline variables are prognostic for the primary outcome. We focus on a method highlighted in that guidance called ``standardization" (or ``g-computation") for estimating the marginal treatment effect. We address the question of how to reliably estimate variance for binary outcomes when marginal outcome probabilities are close to 0 or 1. We propose an influence function-based leave-one-out cross-validated (IF-LOO) variance estimator for the standardized difference-in-means average treatment effect. Through simulation studies, we show that this estimator provides appropriate type-I error control and performs reliably in challenging settings where existing methods can yield inflated type-I error or fail entirely, such as when outcome events are rare or sample sizes are small. In addition to having desirable statistical properties, we derive a closed-form expression for the proposed estimator, enabling straightforward and reliable implementation by study statisticians. The robust finite-sample performance and ease of implementation suggest the IF-LOO variance estimator is a prudent default choice for standardization in clinical trials.

0

stat.ME 2026-05-08

One estimator formula covers many adaptive subpopulation selection rules

Unbiased estimation in two-stage adaptive enrichment designs

A partition condition on interim rules lets designers use a single unbiased formula instead of inventing new ones for each trial design.

abstract click to expand

Recent advances in biomedical research have identified an increasing number of biomarkers associated with heterogeneity in patient responses to medical treatments. When a treatment is suspected to benefit certain patient subpopulations, adaptive enrichment designs may be more efficient and ethical. In such designs, an interim analysis is incorporated during the trial to select patient subpopulations for which the experimental treatment appears promising, according to predefined subpopulation selection rules. However, data-dependent selection can induce selection bias, causing conventional maximum likelihood estimators (MLEs) to overestimate the treatment effect in the selected patient subgroup. Existing inference methods for addressing this bias are typically rule-specific, highlighting the need for an estimation framework that accommodate a broader class of subpopulation selection rules. In this work, we define a general class of subpopulation selection rules based on the sample space partition condition and provide a systematic derivation that yields a unified formula for the Uniformly Minimum Variance Conditional Unbiased Estimator (UMVCUE). This generality allows our formulation to encompass a wide spectrum of adaptive enrichment designs, eliminating the necessity for case-specific derivations for each new design. Extensive simulations confirm the unbiasedness of the proposed UMVCUE, ensuring that therapeutic benefits are not overestimated. By bridging the gap between flexible interim subpopulation selection and rigorous statistical inference, our framework has the potential to facilitate the implementation of diverse subpopulation selection rules with greater ease in real-world trials and promote more efficient and ethical drug development.

0

stat.ME 2026-05-08

History-aware sets shorten survival predictions while keeping coverage

History-Aware Conformal Prediction Sets for Censored Time-to-Event Outcomes

By using full covariate histories up to the decision time and adjusting for censoring, the sets cut median lengths by up to 75 percent in 25

abstract click to expand

Existing conformal prediction methods for time-to-event outcomes leverage only baseline covariates, producing prediction intervals that are insufficiently informative to facilitate decision making. We propose History-Aware Prediction Sets (HAPS), a conformal framework that constructs prediction sets for individual event times using covariate histories observed up to a decision time, targeting coverage among individuals who have survived to this time. HAPS handles right censoring adjusted for time-varying confounders via inverse probability of censoring weighting. When the censoring weights are consistently estimated, it achieves PAAC (probably asymptotically approximately correct) coverage among survivors. We further propose two doubly robust extensions of HAPS to weaken reliance on consistent estimation of the censoring distribution. In simulations, HAPS and its extensions reduce median prediction interval length by up to 75\% relative to baseline comparators while maintaining close to nominal coverage. On two public benchmark data sets, HAPS reduces the median interval length by up to 60\% for predictions at year 5, compared to the baseline comparators.

0

stat.ME 2026-05-08

Jeffreys Bayes estimator beats MLE on small Frank copula samples

Bivariate Frank Copula: Some More Results on Point Estimation of the Association Parameter from a Bayesian Perspective and Revisiting the Goodness of Fit Tests with an Application to Model Groundwater Data from Dong Thap, Vietnam

Simulations show lower mean squared error for the association parameter when sample size is at most 25, with similar performance for largern

abstract click to expand

This work has two major parts. First, we extend the recent study of Pham et al. (2025) on point estimation of the association parameter of a bivariate Frank copula. We investigate two Bayes estimators under the generalized flat prior and the Jeffreys prior, and compare them with the maximum likelihood estimator (MLE). Simulation results show that, for small sample sizes (n <= 25), the Bayes estimator under the Jeffreys prior uniformly outperforms both the generalized flat prior estimator and the MLE in terms of mean squared error (MSE). For moderate and large sample sizes, all estimators have very similar performances in terms of bias and MSE. We also discuss computational issues in the R package implementation that may significantly affect the computation of the MLE for very small samples. In the second part, we apply the Frank copula to analyze the association between groundwater arsenic concentration and other hydrochemical variables using a recent dataset from Vietnam. We revisit the goodness-of-fit tests proposed by Genest et al. (2006), investigate several non-intuitive behaviors of the test statistics, and provide extensive simulated critical value tables. Our results complement and refine the computational findings reported in the earlier literature.

0

stat.ME 2026-05-08

Proxy inferences calibrated by random effects from past domains

Estimate Level Adjustment For Inference With Proxies Under Random Distribution Shifts

Modeling estimate gaps as random effects drawn from historical aggregates reduces bias without storing individual data.

abstract click to expand

In many scientific domains, including experimentation, researchers rely on measurements of proxy outcomes to achieve faster and more frequent reads, especially when the primary outcome of interest is challenging to measure directly. While proxies offer a more readily accessible observation for inference, the ultimate goal is to draw statistical inferences about the primary outcome parameter and proxy data are typically imperfect in some ways. To correct for these imperfections, current statistical inference methods often depend on strict identifying assumptions (such as surrogacy, covariate/label shift, or missingness assumptions). These assumptions can be difficult to validate and may be violated by various additional sources of distribution shift, potentially leading to biased parameter estimates and miscalibrated uncertainty quantification. We introduce an estimate-level framework, inspired by domain adaptation techniques, to empirically calibrate proxy-based inference. This framework models the proxy-primary metric discrepancy as a random effect at the parameter level, estimating its distribution from aggregated historical observations across past domains (e.g., experiments, time periods, or distinct segments). This method avoids the requirement for retaining individual-level response data. Additionally, this adjustment can be layered on top of existing proxy-correction methods (such as prediction-powered inference or importance weighting) to account for additional biases not addressed by those corrections. To manage uncertainty when the number of historical domains is limited, we provide both a method-of-moments estimator and a domain bootstrap procedure. We further validate this approach using publicly available datasets and real-world experiments.

0

stat.ME 2026-05-08 2 theorems

Statistical bounds quantify success for multiple groups steering one classifier

A Statistical Framework for Algorithmic Collective Action with Multiple Collectives

The guarantees depend on collective sizes and goal alignment and remain computable when each group lacks full details on the others.

abstract click to expand

As learning systems increasingly shape everyday decisions, Algorithmic Collective Action (ACA), i.e., users coordinating changes to shared data to steer model behavior, offers a complement to regulator-side policy and corporate model design. Real-world collective actions have traditionally been decentralized and fragmented into multiple collectives, despite sharing overarching objectives, with each collective differing in size, strategy, and actionable goals. However, most of the ACA literature focuses on single collective settings. To address this, we propose the first comprehensive statistical framework for ACA with multiple collectives acting on the same system. In particular, we focus on collective action in classification, studying how multiple collectives can influence a classifier's behavior. We provide quantitative statistical bounds on the success of the collectives, considering the role and the interplay of the collectives' sizes and the alignment of their goals. We make such bounds computable by each collective with only partial knowledge of other collectives' sizes and strategies. Finally, we numerically illustrate our framework on simulations inspired by interventions for climate adaptation in smart cities, demonstrating the usefulness of our bounds.

0

stat.ME 2026-05-08 Recognition

Bayesian tensor model estimates multi-feature contact matrices

Bayesian Modeling and Prediction of Generalized Contact Matrices

Contingency table link and structural constraints allow stable high-dimensional inference from real contact surveys.

abstract click to expand

Social contact matrices are essential tools in infectious disease epidemiology as they quantify close-range human contact patterns which directly drive the transmission of airborne infectious diseases. In this work we propose a Bayesian modeling framework for inferring generalized contact matrices which stratify contact matrices beyond contemporary age dimensions. The model is designed to satisfy fundamental structural assumptions of contacts while leveraging tensor structures and smoothing constraints to make high-dimensional matrix estimation computationally feasible and statistically stable. We discover a link between multi-dimensional matrix stratification subject to structural constraints with the theory of contingency tables. This enables us to approach a challenging missing-data problem commonly encountered in real-world analysis where feature information on the contacts is unobserved. We benchmark the framework against existing methods through simulation studies and illustrate the framework's practical utility through two real-world datasets: BICS (United States) and COVIMOD (Germany). Our models are implemented in an open-source Python package to facilitate adoption in the wider scientific community.

0

stat.ME 2026-05-08

Sorting by relatives recovers causal order in random DAGs

A Topological Sorting Criterion for Random Causal Directed Acyclic Graphs

In Erdős-Rényi and scale-free causal graphs, the count of reachable nodes rises monotonically along the order and acts as a proxy for the a

abstract click to expand

Random directed acyclic graphs (DAGs) based on imposing an order on Erd\H{o}s-R\'enyi and scale free random graphs are widely used for evaluating causal discovery algorithms. We show that in such DAGs, the set of nodes reachable via open paths, termed relatives, increases monotonically along the causal order. We assess the prevalence of this pattern numerically, and demonstrate that it can be exploited for causal order recovery via sorting by the estimated number of relatives. We note that many simulations in the literature feature settings where this yields an excellent proxy for the causal order, and show that a strict increase of relatives along the causal order leads to a singular Markov equivalence class. We propose sampling time-series DAGs as a possible alternative and discuss implications for causal discovery algorithms and their evaluation on synthetic data.

0

stat.ME 2026-05-08

Bayesian averaging of fractional polynomials recovers optimal doses

Bayesian Fractional Polynomials for Optimal Dosage Estimation with Fish Nutrition Applications

Simulations show lower error than benchmarks and the method gives nutrient recommendations with uncertainty for fish feeding trials.

abstract click to expand

The problem of optimal dosage estimation arises in diverse scientific domains, from pharmacology and toxicology to aquaculture and environmental studies. Statistical modeling of nonlinear dose-response relationships is essential to quantify biological effects and determine response-optimal levels. This paper introduces a flexible Bayesian fractional polynomial (BFP) framework for modeling such relationships, allowing for model uncertainty quantification and robust prediction through Bayesian model averaging. Extensive simulation results demonstrate that the proposed BFP approach yields accurate estimation of optimal dose levels, outperforming benchmarks significantly. The approach is demonstrated on real data from fish nutrient requirement experiments.

0

stat.ME 2026-05-08

Linked tensor model shows varying fluoride effects on paired tooth diseases

Linked-Tucker Factorized Individualized Regression for Paired Multivariate Categorical Outcomes

The factorization links caries and fluorosis models to reveal how exposures affect disease presence and severity differently by tooth and by

abstract click to expand

We propose a joint individualized hurdle-ordinal regression model for paired zero-inflated ordinal outcomes with subject-specific, spatially varying, and time-varying covariate effects, motivated by the Iowa Fluoride Study (IFS). The two outcomes, dental caries and dental fluorosis, are measured repeatedly across ages at fine spatial resolution, yielding nested longitudinal data with substantial zero inflation, ordinality, and heterogeneity across individuals and locations. For each outcome, a hurdle component models disease presence, while a proportional-odds component models severity among positive observations. To parsimoniously represent the high-dimensional coefficient arrays, we introduce a linked Tucker tensor factorization. Shared subject-mode factors induce dependence between the caries and fluorosis coefficient tensors, while separate spatial factors accommodate the distinct measurement grids of tooth surfaces and tooth zones. A horseshoe prior on the core tensor elements encourages sparsity, and posterior computation is performed using the No-U-Turn Sampler in NumPyro. Population-level effect summaries are obtained by projecting individualized posterior linear predictors onto the design space, and Wasserstein barycenters aggregate these summaries across tooth locations and anatomical classes. Applied to the IFS, the model reveals spatially heterogeneous associations between early-life fluoride and dietary exposures and both outcomes. Fluoride exposure is associated with increased odds and severity of fluorosis, while soda intake consistently increases caries risk. These associations differ between presence and severity components and vary across tooth locations, ages, and subpopulations defined by prior caries status, highlighting the importance of the joint hurdle-ordinal framework for disentangling disease occurrence from disease progression in multilevel dental data.

0

stat.ME 2026-05-08 Recognition

Linked Tucker model ties fluoride to fluorosis and soda to caries

Linked-Tucker Factorized Individualized Regression for Paired Multivariate Categorical Outcomes

A joint factorization approach reveals spatially varying effects on presence versus severity of paired dental diseases in repeated measures.

abstract click to expand

We propose a joint individualized hurdle-ordinal regression model for paired zero-inflated ordinal outcomes with subject-specific, spatially varying, and time-varying covariate effects, motivated by the Iowa Fluoride Study (IFS). The two outcomes, dental caries and dental fluorosis, are measured repeatedly across ages at fine spatial resolution, yielding nested longitudinal data with substantial zero inflation, ordinality, and heterogeneity across individuals and locations. For each outcome, a hurdle component models disease presence, while a proportional-odds component models severity among positive observations. To parsimoniously represent the high-dimensional coefficient arrays, we introduce a linked Tucker tensor factorization. Shared subject-mode factors induce dependence between the caries and fluorosis coefficient tensors, while separate spatial factors accommodate the distinct measurement grids of tooth surfaces and tooth zones. A horseshoe prior on the core tensor elements encourages sparsity, and posterior computation is performed using the No-U-Turn Sampler in NumPyro. Population-level effect summaries are obtained by projecting individualized posterior linear predictors onto the design space, and Wasserstein barycenters aggregate these summaries across tooth locations and anatomical classes. Applied to the IFS, the model reveals spatially heterogeneous associations between early-life fluoride and dietary exposures and both outcomes. Fluoride exposure is associated with increased odds and severity of fluorosis, while soda intake consistently increases caries risk. These associations differ between presence and severity components and vary across tooth locations, ages, and subpopulations defined by prior caries status, highlighting the importance of the joint hurdle-ordinal framework for disentangling disease occurrence from disease progression in multilevel dental data.

0

stat.ME 2026-05-08

Framework separates effects in bundled versus independent treatment designs

Separable Effects in Four-Arm and Two-Arm Designs

Four-arm data allow direct component identification while two-arm data require testable assumptions for the same analysis.

abstract click to expand

Robins and Richardson (2010) reformulated mediation analysis by decomposing treatments into multiple components and examining separable effects of each component. While this approach is increasingly popular, existing work has analyzed ``two-arm'' data, where components are strictly bundled and manipulated simultaneously. However, in practice, four-arm data where components are assigned independently are often available. For example, testing accommodations might strictly bundle extra time with a separate session or allow them to be assigned separately. To address this distinction, we propose a general framework for analyzing separable effects in four-arm and two-arm designs. This framework provides distinct identification and estimation strategies for each design. For estimation, we utilize efficient influence function estimators coupled with machine learning and cross-fitting techniques. Additionally, we introduce two falsification tests for key identification assumptions required in the two-arm design by leveraging four-arm data. We investigate the performance of the proposed estimators via a simulation study and demonstrate their application by studying the effect of extended time accommodations using data from the National Assessment of Educational Progress. Ultimately, this separable effects analysis enables practitioners to clearly communicate underlying mechanisms and derive informative policy recommendations.

0

stat.ME 2026-05-08

Sequential design gives consistent estimates from non-probability samples

Toward design-based inference for data integration

Treating the non-probability data as a fixed stratum and sampling the rest probabilistically yields unbiased population estimates without

abstract click to expand

Integrating non-probability samples into finite-population inference typically requires modeling unknown selection probabilities under a missing-at-random (MAR) assumption that is difficult to verify. We propose a design-based alternative in which the non-probability sample is treated as a fully observed certainty stratum and a probability sample is drawn only from the complementary, previously unsampled units. Within this sequential framework, we develop two generalized regression estimators: one fitting the outcome model separately in the complementary stratum, the other pooling both samples; we make two distinct contributions. First, both estimators are design-consistent and admit consistent variance estimators with no assumption whatsoever on the non-probability selection mechanism, including under not-missing-at-random (NMAR) selection. Second, under a working superpopulation model that holds in both strata, the pilot non-probability sample can be used to construct second-stage inclusion probabilities that achieve Isaki-Fuller asymptotic optimality for the separate estimator; this optimality claim relies on assumptions strictly stronger than MAR, but its failure does not invalidate the consistency results above. A diagnostic test for coefficient homogeneity is proposed to guide the choice between the two estimators. Simulations confirm that the sequential estimators remain essentially unbiased under both MAR and NMAR, while propensity-adjusted competitors can be severely biased under NMAR. Two applications from Lithuanian official statistics illustrate that separate regression is preferable when the pilot stratum and its complement are strongly heterogeneous, whereas combined regression offers a modest efficiency gain when the two strata are similar.

0

stat.ME 2026-05-08

Two-step method adds biomarker variability to joint survival models

Joint modelling of time-dependent biomarker variability and time-to-event outcomes, a two-step approach

Variability in markers like white blood cell counts predicts overall survival in cancer trials, now estimable with standard software.

abstract click to expand

Increasing evidence suggests that variability in longitudinal biomarkers, in addition to their mean trajectory, carries prognostic information for time-to-event outcomes. However, standard joint models typically capture only the expected value of the biomarker process, assuming constant residual variability across individuals and time. Fully joint extensions that model within-subject variability exist but are computationally demanding and require dedicated software packages. We propose a flexible two-step approach for incorporating biomarker variability into joint models. First, residuals (or their transformations) from a mixed-effects model are used to derive subject- and time-specific measures of variability. Second, these variability measures are included in a standard joint model, allowing their association with survival to be estimated alongside the mean biomarker trajectory. Our approach can also accommodate multiple biomarkers simultaneously and is readily implemented using existing joint modeling software without custom extensions. Through simulations, we show that our method provides reasonable performance for variability effects across a range of scenarios. We further illustrate our approach using longitudinal data of white blood cell counts from a large phase III glioblastoma trial, demonstrating that both mean levels and variability of hematological markers carry prognostic information for overall survival.

0

stat.ME 2026-05-08

Stochastic interventions balance meds to isolate treatment effects

Estimation of treatment effects in presence of differential use of post-randomization concomitant medication with time-to-event outcomes

New estimands adjust for extra post-randomization medications in time-to-event trials, separating drug-specific impact from diluted standard

abstract click to expand

In placebo-controlled randomized trials, the post-randomization use of concomitant medications may be higher in the placebo arm than in the treatment arm. This may dilute the full benefits of the randomized drug as estimated by the intention-to-treat analysis. We focus on cardiovascular outcomes trials in type-2 diabetes patients of glucose-lowering treatments where patients in the placebo arm are more likely to add other glucose-lowering agents with established cardio-protective properties. As a supplement to the intention-to-treat analysis, we propose a class of estimands within a causal framework that isolates the specific impact of the treatment being studied from that of concomitant treatment use. These estimands are defined under time-dependent treatment interventions to balance exposure to additional medications across intervention arms. We advocate for specific stochastic interventions to achieve this balance while minimizing positivity violations, which arise when certain treatment combinations or characteristics are not sufficiently represented in the data. We employ targeted minimum loss-based estimation (TMLE) to optimize the estimation procedure for our estimands while allowing for flexible adjustments for time-dependent covariates from follow-up visits. Finally, we demonstrate the application of the methods through a simulation study and a real-world example from the LEADER cardiovascular outcomes trial, which assessed cardiovascular risk for liraglutide versus placebo.

0

stat.ME 2026-05-08

Kernel copula embeddings detect causal dependence shifts

Detecting Changes in Causal Dependence with Kernels and Copulas

The integrated difference is provably zero without change and positive when the causal mechanism between X and Y changes, even with unknown,

abstract click to expand

We propose a framework for determining whether the causal dependence of an outcome $Y$ on a covariate $X$ changes at a given time point, given confounders $\boldsymbol{Z}$. For instance, in financial markets, the effect of a market indicator on asset returns may causally change over time. While many existing measures of association can be used to detect changes in joint and marginal distributions, in the absence of strong assumptions on the data generating process none are suitable for detecting changes in the causal mechanism or in the strength of causal relationship. In this work we approach the problem from a fully non-parametric perspective, and treat the causal mechanism as well as the distribution of the data as unknown. We introduce a quantity based on the integrated difference between kernel mean embeddings of certain conditionals copula, which is provably equal to zero if the causal dependence does not change and strictly positive else. A near-linear time estimator for the quantity is proposed, with rates of convergence explicitly spelled out. Extensive experiments demonstrate that the proposed statistic achieves high accuracy on multiple synthetic and real-world datasets. We additionally show how the proposed statistic can be used for change point detection when the goal is to detect changes in causal dependence occurring at an unknown times.

0