Semiparametric semi-supervised learning for general targets under distribution shift and decaying overlap

Semiparametric semi-supervised learning for general targets under distribution shift, decaying overlap , author= · 2025 · math.ST · arXiv 2505.06452

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

In modern scientific applications, large volumes of covariate data are readily available, while outcome labels are costly, sparse, and often subject to distribution shift. This asymmetry has spurred interest in semi-supervised (SS) learning, but most existing approaches rely on strong assumptions -- such as missing completely at random (MCAR) labeling or strict positivity -- that put substantial limitations on their practical usefulness. In this work, we introduce a general semiparametric framework for estimation, inference, and efficiency benchmarking in SS settings where labels are missing at random (MAR) and the overlap may vanish as sample size increases. Our framework, that we label D2S3, accommodates a wide range of smooth statistical targets -- including means, linear regression coefficients, quantiles, and causal effects -- and remains valid under high-dimensional nuisance estimation and distributional shift between labeled and unlabeled samples. We extend the theoretical guarantees of augmented inverse probability weighting estimators to preserve double robustness, asymptotic normality, and semiparametric efficiency under this challenging D2S3 regime. A key insight is that classical root-n convergence fails under vanishing overlap; we instead provide corrected asymptotic rates that capture the impact of the decay in overlap. We validate our theory through simulations and demonstrate practical utility in real-world applications on the internet of things and public health where labeled data are scarce.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Optimized Labeling Resource Allocation for Prediction-Assisted Inference via OPAL

stat.ME · 2026-06-02 · unverdicted · novelty 6.0

OPAL learns optimal smooth labeling policies from ML uncertainty scores to enable low-variance prediction-assisted inference with finite-sample coverage guarantees.

Transporting treatment effects by calibrating large-scale observational outcomes

stat.ME · 2026-05-08 · unverdicted · novelty 6.0 · 2 refs

A calibration procedure yields a weighted transported average treatment effect with asymptotically valid and efficient inference when experimental data grows slower than observational data, even without positivity or correct OLS specification.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Optimized Labeling Resource Allocation for Prediction-Assisted Inference via OPAL stat.ME · 2026-06-02 · unverdicted · none · ref 49 · internal anchor
OPAL learns optimal smooth labeling policies from ML uncertainty scores to enable low-variance prediction-assisted inference with finite-sample coverage guarantees.
Transporting treatment effects by calibrating large-scale observational outcomes stat.ME · 2026-05-08 · unverdicted · none · ref 26 · 2 links · internal anchor
A calibration procedure yields a weighted transported average treatment effect with asymptotically valid and efficient inference when experimental data grows slower than observational data, even without positivity or correct OLS specification.

Semiparametric semi-supervised learning for general targets under distribution shift and decaying overlap

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer