Machine learning methods for finite population parameter estimation in survey sampling

· 2026 · stat.ME · arXiv 2604.01160

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

This pedagogical review examines the use of machine learning methods in finite-population inference for survey sampling, with an emphasis on design-based validity and statistical inference. While flexible prediction tools offer substantial gains in estimation accuracy, they also introduce important challenges, primarily due to the dependence between the fitted predictors and the sample. We focus on settings in which such predictions enter survey estimation through model-assisted estimation, item nonresponse imputation, and unit nonresponse adjustment. For model-assisted estimation and item nonresponse, we show how cross-fitting and Neyman-orthogonal estimating equations can adapt ideas from double/debiased machine learning to survey data, allowing the use of high-dimensional or nonparametric learners while preserving root-n consistency and asymptotic normality under suitable conditions. In contrast, for unit nonresponse, standard inverse-probability weighting remains outcome-agnostic and operationally attractive, but this same feature makes doubly robust and orthogonal constructions harder to deploy in official statistics. We also briefly discuss related developments in small area estimation and probability/nonprobability data integration. Overall, the paper highlights both the promise of machine learning and the fundamental inferential challenges it raises for survey practice.

representative citing papers

Cross-Fitted Survey-Weighted TMLE with Design-Based Variance for Causal Machine Learning

stat.ME · 2026-06-29 · unverdicted · novelty 7.0

Cluster-level cross-fitting restores valid coverage for survey-weighted TMLE with flexible learners under stratified multistage designs, while single-fit and internal cross-validation versions under-cover.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Cross-Fitted Survey-Weighted TMLE with Design-Based Variance for Causal Machine Learning stat.ME · 2026-06-29 · unverdicted · none · ref 74 · internal anchor
Cluster-level cross-fitting restores valid coverage for survey-weighted TMLE with flexible learners under stratified multistage designs, while single-fit and internal cross-validation versions under-cover.

Machine learning methods for finite population parameter estimation in survey sampling

fields

years

verdicts

representative citing papers

citing papers explorer