Deep Neural Networks for Doubly Robust Estimation with Nonprobability Survey Samples
Pith reviewed 2026-06-29 09:20 UTC · model grok-4.3
The pith
Deep neural networks model the sampling score nonparametrically to yield consistent doubly robust estimators for the finite population mean from probability and nonprobability samples.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By treating the logit sampling score as an unknown nonparametric function approximable by a deep neural network and optimizing its parameters through pseudo-likelihood maximization on the combined samples, the resulting DNN-assisted IPW estimator and deep doubly robust estimator are consistent for the finite population mean, with established convergence rates under regularity conditions. Simulation studies and an application to Pew Research Center and BRFSS data indicate improved robustness to parametric propensity-score misspecification when the selection mechanism is nonlinear.
What carries the argument
DNN-estimated sampling scores obtained by maximizing a pseudo-likelihood that merges nonprobability and probability samples, then incorporated into the inverse-probability weighted estimator and the doubly robust estimator.
If this is right
- The DNN-assisted inverse-probability weighted estimator is consistent for the finite population mean.
- The deep doubly robust estimator remains consistent even under misspecification of one model component.
- Convergence rates for both estimators follow from the DNN approximation properties and regularity conditions.
- Finite-sample performance improves over standard parametric propensity-score methods when the true selection mechanism is nonlinear.
Where Pith is reading between the lines
- If the nonparametric approximation holds, the same DNN pseudo-likelihood step could be applied to estimate other finite-population quantities such as totals or quantiles.
- The framework suggests a route for importing flexible nonparametric propensity modeling from causal inference into classical survey sampling problems.
- Performance under high-dimensional covariates or alternative DNN architectures remains an open empirical question that could be tested directly with the proposed estimators.
Load-bearing premise
The logit sampling score for the nonprobability sample can be modeled as an unknown nonparametric function that a DNN can approximate sufficiently well via pseudo-likelihood maximization, and the regularity conditions for consistency and rates hold in the survey setting.
What would settle it
A simulation or real dataset in which the selection mechanism is nonlinear yet the DNN-based estimators show persistent bias or no gain in mean squared error relative to a correctly specified parametric model, or fail to achieve the derived convergence rates.
Figures
read the original abstract
Integrating probability and nonprobability survey samples is an important problem in modern survey sampling. Nonprobability samples often contain rich outcome information but may lack population representativeness, whereas probability samples provide design-based auxiliary information but may not contain the study variable. We propose a deep neural network (DNN)-assisted doubly robust framework for estimating the finite population mean from these two data sources. The proposed method models the logit sampling score for the nonprobability sample as an unknown nonparametric function and estimates it by maximizing a pseudo-likelihood that combines information from the nonprobability sample and a reference probability sample. The DNN parameters are optimized using the ADAM algorithm. The resulting DNN-estimated sampling scores are incorporated into a DNN-assisted inverse-probability weighted estimator and a deep doubly robust estimator. We establish consistency and convergence rates under regularity conditions and evaluate the finite-sample performance of the proposed estimators through simulation studies and an empirical application using Pew Research Center and Behavioral Risk Factor Surveillance System data. The results suggest that the proposed estimators can improve robustness to parametric propensity-score misspecification, especially when the true selection mechanism is nonlinear.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a deep neural network (DNN) approach for estimating the logit sampling score of nonprobability samples by maximizing a pseudo-likelihood that combines data from the nonprobability sample and a reference probability sample. The estimated scores are used to construct a DNN-assisted inverse-probability weighted estimator and a deep doubly robust estimator for the finite population mean. Consistency and convergence rates are established under regularity conditions. The method is assessed through simulation studies and an empirical application using data from the Pew Research Center and the Behavioral Risk Factor Surveillance System (BRFSS). The results indicate improved robustness to misspecification of the selection mechanism when it is nonlinear.
Significance. Assuming the theoretical results are correctly derived, this manuscript makes a significant contribution to survey sampling methodology by providing a flexible, nonparametric tool for propensity score estimation that can better capture complex, nonlinear relationships compared to traditional parametric models. The doubly robust framework adds protection against misspecification, and the simulation and real-data analyses demonstrate practical advantages. The use of modern optimization techniques like ADAM for DNN training aligns the method with current machine learning practices in statistics.
minor comments (3)
- The abstract refers to 'regularity conditions' without specifying them; a cross-reference to the section detailing these conditions would improve clarity.
- The empirical application section would benefit from more details on how the probability sample is used as reference and any preprocessing steps applied to the data.
- Consider adding results for different DNN depths or widths to assess sensitivity to architecture choices in the simulation studies.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of our manuscript on DNN-assisted doubly robust estimation for finite population means from combined probability and nonprobability samples. The recommendation for minor revision is noted with appreciation. No specific major comments appear in the report, so we provide no point-by-point responses below.
Circularity Check
No significant circularity detected
full rationale
The paper introduces a DNN to nonparametrically model the logit sampling score via pseudo-likelihood maximization on combined nonprobability and probability samples, then plugs the resulting scores into IPW and doubly robust estimators while proving consistency and rates under regularity conditions. No quoted step reduces a claimed prediction or uniqueness result to a fitted parameter by construction, no load-bearing self-citation chain appears, and the derivation chain is self-contained against external statistical benchmarks for DNN approximation and survey sampling asymptotics.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Regularity conditions hold that enable consistency and convergence rates of the DNN-assisted estimators
Reference graph
Works this paper leans on
-
[1]
Anthony, M., & Bartlett, P. L. (2009). Neural network learning: Theoretical foundations. Cambridge University Press, Cambridge
2009
-
[2]
Chen, Y., Li, P., & Wu, C. (2020). Doubly robust inference with nonprobability survey samples. Journal of the American Statistical Association, 115, 2011--2021
2020
-
[3]
Deville, J.-C., & S \"a rndal, C.-E. (1992). Calibration estimators in survey sampling. Journal of the American Statistical Association, 87, 376--382
1992
-
[4]
and Dennis, J
DiSogra, C., Cobb, C., Chan, E. and Dennis, J. M. (2011). Calibrating nonprobability internet samples with probability samples using early adopter characteristics. Joint Statistical Meetings (JSM), Survey Research Methods, pp. 4501-4515
2011
-
[5]
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press, Cambridge, MA
2016
-
[6]
Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (pp. 249--256)
2010
-
[7]
Han, S., Pool, J., Tran, J., & Dally, W. (2015). Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems (Vol. 28, pp. 1135--1143)
2015
-
[8]
Ji, Z., Li, J., & Telgarsky, M. (2021). Early-stopped neural networks are consistent. Advances in Neural Information Processing Systems, 34, 1805--1817
2021
-
[9]
Keiding, N., & Louis, T. A. (2016). Perils and potentials of self-selected entry to epidemiological studies and surveys. Journal of the Royal Statistical Society: Series A, 179, 319--376
2016
-
[10]
Kitchin, R. (2015). The opportunities, challenges and risks of big data for official statistics. Statistical Journal of the IAOS, 31, 471--481
2015
-
[11]
K., Park, S., Chen, Y., & Wu, C
Kim, J. K., Park, S., Chen, Y., & Wu, C. (2021). Combining nonprobability and probability survey samples through mass imputation. Journal of the Royal Statistical Society: Series A (Statistics in Society), 184, 941--963
2021
-
[12]
Adam: A Method for Stochastic Optimization
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[13]
Y., Pinkus, A., & Schocken, S
Leshno, M., Lin, V. Y., Pinkus, A., & Schocken, S. (1993). Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Networks, 6, 861--867
1993
-
[14]
Li, M., Soltanolkotabi, M., & Oymak, S. (2020). Gradient descent with early stopping is provably robust to label noise for overparameterized neural networks. In Advances in Neural Information Processing Systems (pp. 4313--4324)
2020
-
[15]
Little, Roderick JA., Rubin, Donald B. (2019). Statistical analysis with missing data(3nd ed.), John Wiley & Sons
2019
-
[16]
Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10) (pp. 807--814)
2010
-
[17]
Rivers, D. (2007). Sampling for web surveys. American Statistical Association Proceedings, 4, 1320
2007
-
[18]
Rubin, Donald B. (1976). Inference and missing data. Biometrika, 63, 581-592
1976
-
[19]
Rosenbaum, Paul R., Rubin, Donald B. (1983). The central role of the propensity score in observational studies for causal effects, Biometrika, 70, 41-55
1983
-
[20]
Schmidt-Hieber, J. (2020). Nonparametric regression using deep neural networks with ReLU activation function. The Annals of Statistics, 48, 1916--1921
2020
-
[21]
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15, 1929--1958
2014
-
[22]
Srinivas, S., Subramanya, A., & Venkatesh Babu, R. (2017). Training sparse neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 138--145)
2017
-
[23]
Valliant, R., & Dever, J. A. (2011). Estimating propensity adjustments for volunteer web surveys. Sociological Methods & Research, 40, 105--137
2011
-
[24]
C., & Li, Y
Sun, Y., Kang, J., Haridas, C., Mayne, N., Potter, A., Yang, C.-F., Christiani, D. C., & Li, Y. (2024). Penalized deep partially linear Cox models with application to CT scans of lung cancer patients. Biometrics, 80, ujad024
2024
-
[25]
Van der Vaart, A. W. (2000). Asymptotic statistics. Cambridge University Press, Cambridge
2000
-
[26]
W., & Wellner, J
Van der Vaart, A. W., & Wellner, J. A. (1996). Weak convergence and empirical processes: With applications to statistics. Springer, New York
1996
-
[27]
Wu, C., & Sitter, R. R. (2001). A model-calibration approach to using complete auxiliary information from survey data. Journal of the American Statistical Association, 96, 185--193
2001
-
[28]
Zhong, Q., Mueller, J., & Wang, J.-L. (2022). Deep learning for the partially linear Cox model. The Annals of Statistics, 50, 1348--1375
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.