Deep Neural Networks for Doubly Robust Estimation with Nonprobability Survey Samples

Shihua Luo; Wendy Lou; Xuewen Lu; Yufang Dai; Zilin Wang

arxiv: 2605.28762 · v1 · pith:R5DMEQNHnew · submitted 2026-05-27 · 🧮 math.ST · stat.AP· stat.CO· stat.ME· stat.ML· stat.TH

Deep Neural Networks for Doubly Robust Estimation with Nonprobability Survey Samples

Yufang Dai , Shihua Luo , Wendy Lou , Zilin Wang , Xuewen Lu This is my paper

Pith reviewed 2026-06-29 09:20 UTC · model grok-4.3

classification 🧮 math.ST stat.APstat.COstat.MEstat.MLstat.TH

keywords deep neural networksdoubly robust estimationnonprobability survey samplessampling scoresinverse probability weightingfinite population meanpropensity score estimation

0 comments

The pith

Deep neural networks model the sampling score nonparametrically to yield consistent doubly robust estimators for the finite population mean from probability and nonprobability samples.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes modeling the logit sampling score for a nonprobability sample as an unknown nonparametric function and estimating it by maximizing a pseudo-likelihood that combines the nonprobability sample with a reference probability sample. The DNN parameters are fit via the ADAM algorithm, and the resulting scores are plugged into a DNN-assisted inverse-probability weighted estimator and a deep doubly robust estimator. A sympathetic reader would care because nonprobability samples often contain rich outcome data but suffer from unknown selection bias, while probability samples supply design information; the method aims to reduce sensitivity to misspecification of that bias, especially when the true mechanism is nonlinear.

Core claim

By treating the logit sampling score as an unknown nonparametric function approximable by a deep neural network and optimizing its parameters through pseudo-likelihood maximization on the combined samples, the resulting DNN-assisted IPW estimator and deep doubly robust estimator are consistent for the finite population mean, with established convergence rates under regularity conditions. Simulation studies and an application to Pew Research Center and BRFSS data indicate improved robustness to parametric propensity-score misspecification when the selection mechanism is nonlinear.

What carries the argument

DNN-estimated sampling scores obtained by maximizing a pseudo-likelihood that merges nonprobability and probability samples, then incorporated into the inverse-probability weighted estimator and the doubly robust estimator.

If this is right

The DNN-assisted inverse-probability weighted estimator is consistent for the finite population mean.
The deep doubly robust estimator remains consistent even under misspecification of one model component.
Convergence rates for both estimators follow from the DNN approximation properties and regularity conditions.
Finite-sample performance improves over standard parametric propensity-score methods when the true selection mechanism is nonlinear.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the nonparametric approximation holds, the same DNN pseudo-likelihood step could be applied to estimate other finite-population quantities such as totals or quantiles.
The framework suggests a route for importing flexible nonparametric propensity modeling from causal inference into classical survey sampling problems.
Performance under high-dimensional covariates or alternative DNN architectures remains an open empirical question that could be tested directly with the proposed estimators.

Load-bearing premise

The logit sampling score for the nonprobability sample can be modeled as an unknown nonparametric function that a DNN can approximate sufficiently well via pseudo-likelihood maximization, and the regularity conditions for consistency and rates hold in the survey setting.

What would settle it

A simulation or real dataset in which the selection mechanism is nonlinear yet the DNN-based estimators show persistent bias or no gain in mean squared error relative to a correctly specified parametric model, or fail to achieve the derived convergence rates.

Figures

Figures reproduced from arXiv: 2605.28762 by Shihua Luo, Wendy Lou, Xuewen Lu, Yufang Dai, Zilin Wang.

read the original abstract

Integrating probability and nonprobability survey samples is an important problem in modern survey sampling. Nonprobability samples often contain rich outcome information but may lack population representativeness, whereas probability samples provide design-based auxiliary information but may not contain the study variable. We propose a deep neural network (DNN)-assisted doubly robust framework for estimating the finite population mean from these two data sources. The proposed method models the logit sampling score for the nonprobability sample as an unknown nonparametric function and estimates it by maximizing a pseudo-likelihood that combines information from the nonprobability sample and a reference probability sample. The DNN parameters are optimized using the ADAM algorithm. The resulting DNN-estimated sampling scores are incorporated into a DNN-assisted inverse-probability weighted estimator and a deep doubly robust estimator. We establish consistency and convergence rates under regularity conditions and evaluate the finite-sample performance of the proposed estimators through simulation studies and an empirical application using Pew Research Center and Behavioral Risk Factor Surveillance System data. The results suggest that the proposed estimators can improve robustness to parametric propensity-score misspecification, especially when the true selection mechanism is nonlinear.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DNNs to nonparametrically estimate sampling scores inside IPW and DR estimators for combining prob and nonprob samples, with claimed consistency and some simulations.

read the letter

The core of this paper is fitting a deep net to the logit sampling score for a nonprobability sample by maximizing a pseudo-likelihood that also uses a reference probability sample, then feeding the fitted scores into an IPW estimator and a doubly robust one for the finite population mean.

What is new is the specific use of DNNs with ADAM for this nonparametric modeling step in the survey data-integration setting. The abstract indicates they derive consistency and convergence rates under regularity conditions, run simulations, and apply the method to Pew and BRFSS data. The simulations reportedly show gains when the true selection mechanism is nonlinear, which is the practical selling point.

The theory looks standard for nonparametric propensity estimation, but the regularity conditions are not detailed here, so it is unclear how well they fit typical survey sampling constraints like finite populations or design features. The convergence rates are asserted rather than derived in the abstract, and one would want to see whether the rates are fast enough to matter in moderate samples. The empirical example is only summarized, so its strength is hard to judge without the numbers.

This is aimed at survey statisticians who already work on probability-nonprobability integration. It has both asymptotic claims and finite-sample checks, so it is worth sending to referees even if the conditions turn out to need tightening.

Referee Report

0 major / 3 minor

Summary. The paper introduces a deep neural network (DNN) approach for estimating the logit sampling score of nonprobability samples by maximizing a pseudo-likelihood that combines data from the nonprobability sample and a reference probability sample. The estimated scores are used to construct a DNN-assisted inverse-probability weighted estimator and a deep doubly robust estimator for the finite population mean. Consistency and convergence rates are established under regularity conditions. The method is assessed through simulation studies and an empirical application using data from the Pew Research Center and the Behavioral Risk Factor Surveillance System (BRFSS). The results indicate improved robustness to misspecification of the selection mechanism when it is nonlinear.

Significance. Assuming the theoretical results are correctly derived, this manuscript makes a significant contribution to survey sampling methodology by providing a flexible, nonparametric tool for propensity score estimation that can better capture complex, nonlinear relationships compared to traditional parametric models. The doubly robust framework adds protection against misspecification, and the simulation and real-data analyses demonstrate practical advantages. The use of modern optimization techniques like ADAM for DNN training aligns the method with current machine learning practices in statistics.

minor comments (3)

The abstract refers to 'regularity conditions' without specifying them; a cross-reference to the section detailing these conditions would improve clarity.
The empirical application section would benefit from more details on how the probability sample is used as reference and any preprocessing steps applied to the data.
Consider adding results for different DNN depths or widths to assess sensitivity to architecture choices in the simulation studies.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our manuscript on DNN-assisted doubly robust estimation for finite population means from combined probability and nonprobability samples. The recommendation for minor revision is noted with appreciation. No specific major comments appear in the report, so we provide no point-by-point responses below.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces a DNN to nonparametrically model the logit sampling score via pseudo-likelihood maximization on combined nonprobability and probability samples, then plugs the resulting scores into IPW and doubly robust estimators while proving consistency and rates under regularity conditions. No quoted step reduces a claimed prediction or uniqueness result to a fitted parameter by construction, no load-bearing self-citation chain appears, and the derivation chain is self-contained against external statistical benchmarks for DNN approximation and survey sampling asymptotics.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the DNN being able to approximate the nonparametric logit sampling score and on unspecified regularity conditions for the consistency proofs; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption Regularity conditions hold that enable consistency and convergence rates of the DNN-assisted estimators
Invoked in the abstract to establish theoretical properties of the proposed estimators.

pith-pipeline@v0.9.1-grok · 5740 in / 1195 out tokens · 25106 ms · 2026-06-29T09:20:26.817027+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 1 canonical work pages · 1 internal anchor

[1]

Anthony, M., & Bartlett, P. L. (2009). Neural network learning: Theoretical foundations. Cambridge University Press, Cambridge

2009
[2]

Chen, Y., Li, P., & Wu, C. (2020). Doubly robust inference with nonprobability survey samples. Journal of the American Statistical Association, 115, 2011--2021

2020
[3]

Deville, J.-C., & S \"a rndal, C.-E. (1992). Calibration estimators in survey sampling. Journal of the American Statistical Association, 87, 376--382

1992
[4]

and Dennis, J

DiSogra, C., Cobb, C., Chan, E. and Dennis, J. M. (2011). Calibrating nonprobability internet samples with probability samples using early adopter characteristics. Joint Statistical Meetings (JSM), Survey Research Methods, pp. 4501-4515

2011
[5]

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press, Cambridge, MA

2016
[6]

Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (pp. 249--256)

2010
[7]

Han, S., Pool, J., Tran, J., & Dally, W. (2015). Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems (Vol. 28, pp. 1135--1143)

2015
[8]

Ji, Z., Li, J., & Telgarsky, M. (2021). Early-stopped neural networks are consistent. Advances in Neural Information Processing Systems, 34, 1805--1817

2021
[9]

Keiding, N., & Louis, T. A. (2016). Perils and potentials of self-selected entry to epidemiological studies and surveys. Journal of the Royal Statistical Society: Series A, 179, 319--376

2016
[10]

Kitchin, R. (2015). The opportunities, challenges and risks of big data for official statistics. Statistical Journal of the IAOS, 31, 471--481

2015
[11]

K., Park, S., Chen, Y., & Wu, C

Kim, J. K., Park, S., Chen, Y., & Wu, C. (2021). Combining nonprobability and probability survey samples through mass imputation. Journal of the Royal Statistical Society: Series A (Statistics in Society), 184, 941--963

2021
[12]

Adam: A Method for Stochastic Optimization

Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980

work page internal anchor Pith review Pith/arXiv arXiv 2014
[13]

Y., Pinkus, A., & Schocken, S

Leshno, M., Lin, V. Y., Pinkus, A., & Schocken, S. (1993). Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Networks, 6, 861--867

1993
[14]

Li, M., Soltanolkotabi, M., & Oymak, S. (2020). Gradient descent with early stopping is provably robust to label noise for overparameterized neural networks. In Advances in Neural Information Processing Systems (pp. 4313--4324)

2020
[15]

Little, Roderick JA., Rubin, Donald B. (2019). Statistical analysis with missing data(3nd ed.), John Wiley & Sons

2019
[16]

Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10) (pp. 807--814)

2010
[17]

Rivers, D. (2007). Sampling for web surveys. American Statistical Association Proceedings, 4, 1320

2007
[18]

Rubin, Donald B. (1976). Inference and missing data. Biometrika, 63, 581-592

1976
[19]

Rosenbaum, Paul R., Rubin, Donald B. (1983). The central role of the propensity score in observational studies for causal effects, Biometrika, 70, 41-55

1983
[20]

Schmidt-Hieber, J. (2020). Nonparametric regression using deep neural networks with ReLU activation function. The Annals of Statistics, 48, 1916--1921

2020
[21]

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15, 1929--1958

2014
[22]

Srinivas, S., Subramanya, A., & Venkatesh Babu, R. (2017). Training sparse neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 138--145)

2017
[23]

Valliant, R., & Dever, J. A. (2011). Estimating propensity adjustments for volunteer web surveys. Sociological Methods & Research, 40, 105--137

2011
[24]

C., & Li, Y

Sun, Y., Kang, J., Haridas, C., Mayne, N., Potter, A., Yang, C.-F., Christiani, D. C., & Li, Y. (2024). Penalized deep partially linear Cox models with application to CT scans of lung cancer patients. Biometrics, 80, ujad024

2024
[25]

Van der Vaart, A. W. (2000). Asymptotic statistics. Cambridge University Press, Cambridge

2000
[26]

W., & Wellner, J

Van der Vaart, A. W., & Wellner, J. A. (1996). Weak convergence and empirical processes: With applications to statistics. Springer, New York

1996
[27]

Wu, C., & Sitter, R. R. (2001). A model-calibration approach to using complete auxiliary information from survey data. Journal of the American Statistical Association, 96, 185--193

2001
[28]

Zhong, Q., Mueller, J., & Wang, J.-L. (2022). Deep learning for the partially linear Cox model. The Annals of Statistics, 50, 1348--1375

2022

[1] [1]

Anthony, M., & Bartlett, P. L. (2009). Neural network learning: Theoretical foundations. Cambridge University Press, Cambridge

2009

[2] [2]

Chen, Y., Li, P., & Wu, C. (2020). Doubly robust inference with nonprobability survey samples. Journal of the American Statistical Association, 115, 2011--2021

2020

[3] [3]

Deville, J.-C., & S \"a rndal, C.-E. (1992). Calibration estimators in survey sampling. Journal of the American Statistical Association, 87, 376--382

1992

[4] [4]

and Dennis, J

DiSogra, C., Cobb, C., Chan, E. and Dennis, J. M. (2011). Calibrating nonprobability internet samples with probability samples using early adopter characteristics. Joint Statistical Meetings (JSM), Survey Research Methods, pp. 4501-4515

2011

[5] [5]

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press, Cambridge, MA

2016

[6] [6]

Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (pp. 249--256)

2010

[7] [7]

Han, S., Pool, J., Tran, J., & Dally, W. (2015). Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems (Vol. 28, pp. 1135--1143)

2015

[8] [8]

Ji, Z., Li, J., & Telgarsky, M. (2021). Early-stopped neural networks are consistent. Advances in Neural Information Processing Systems, 34, 1805--1817

2021

[9] [9]

Keiding, N., & Louis, T. A. (2016). Perils and potentials of self-selected entry to epidemiological studies and surveys. Journal of the Royal Statistical Society: Series A, 179, 319--376

2016

[10] [10]

Kitchin, R. (2015). The opportunities, challenges and risks of big data for official statistics. Statistical Journal of the IAOS, 31, 471--481

2015

[11] [11]

K., Park, S., Chen, Y., & Wu, C

Kim, J. K., Park, S., Chen, Y., & Wu, C. (2021). Combining nonprobability and probability survey samples through mass imputation. Journal of the Royal Statistical Society: Series A (Statistics in Society), 184, 941--963

2021

[12] [12]

Adam: A Method for Stochastic Optimization

Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980

work page internal anchor Pith review Pith/arXiv arXiv 2014

[13] [13]

Y., Pinkus, A., & Schocken, S

Leshno, M., Lin, V. Y., Pinkus, A., & Schocken, S. (1993). Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Networks, 6, 861--867

1993

[14] [14]

Li, M., Soltanolkotabi, M., & Oymak, S. (2020). Gradient descent with early stopping is provably robust to label noise for overparameterized neural networks. In Advances in Neural Information Processing Systems (pp. 4313--4324)

2020

[15] [15]

Little, Roderick JA., Rubin, Donald B. (2019). Statistical analysis with missing data(3nd ed.), John Wiley & Sons

2019

[16] [16]

Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10) (pp. 807--814)

2010

[17] [17]

Rivers, D. (2007). Sampling for web surveys. American Statistical Association Proceedings, 4, 1320

2007

[18] [18]

Rubin, Donald B. (1976). Inference and missing data. Biometrika, 63, 581-592

1976

[19] [19]

Rosenbaum, Paul R., Rubin, Donald B. (1983). The central role of the propensity score in observational studies for causal effects, Biometrika, 70, 41-55

1983

[20] [20]

Schmidt-Hieber, J. (2020). Nonparametric regression using deep neural networks with ReLU activation function. The Annals of Statistics, 48, 1916--1921

2020

[21] [21]

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15, 1929--1958

2014

[22] [22]

Srinivas, S., Subramanya, A., & Venkatesh Babu, R. (2017). Training sparse neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 138--145)

2017

[23] [23]

Valliant, R., & Dever, J. A. (2011). Estimating propensity adjustments for volunteer web surveys. Sociological Methods & Research, 40, 105--137

2011

[24] [24]

C., & Li, Y

Sun, Y., Kang, J., Haridas, C., Mayne, N., Potter, A., Yang, C.-F., Christiani, D. C., & Li, Y. (2024). Penalized deep partially linear Cox models with application to CT scans of lung cancer patients. Biometrics, 80, ujad024

2024

[25] [25]

Van der Vaart, A. W. (2000). Asymptotic statistics. Cambridge University Press, Cambridge

2000

[26] [26]

W., & Wellner, J

Van der Vaart, A. W., & Wellner, J. A. (1996). Weak convergence and empirical processes: With applications to statistics. Springer, New York

1996

[27] [27]

Wu, C., & Sitter, R. R. (2001). A model-calibration approach to using complete auxiliary information from survey data. Journal of the American Statistical Association, 96, 185--193

2001

[28] [28]

Zhong, Q., Mueller, J., & Wang, J.-L. (2022). Deep learning for the partially linear Cox model. The Annals of Statistics, 50, 1348--1375

2022