Closed-form estimation and inference for panels with attrition and refreshment samples

Grigory Franguridi; Lidia Kosenkova

arxiv: 2410.11263 · v4 · pith:RJNKPUWLnew · submitted 2024-10-15 · 💰 econ.EM

Closed-form estimation and inference for panels with attrition and refreshment samples

Grigory Franguridi , Lidia Kosenkova This is my paper

Pith reviewed 2026-05-23 19:10 UTC · model grok-4.3

classification 💰 econ.EM

keywords attritionpanel datarefreshment samplesclosed-form estimationempirical CDFidentificationasymptotic normality

0 comments

The pith

An alternative identifying assumption permits closed-form consistent estimation for panels with attrition via empirical CDF transformation using refreshment samples.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses attrition in panel data by leveraging refreshment samples under a new nonparametric identifying assumption distinct from prior work. This assumption supports a direct transformation of the empirical cumulative distribution function to produce estimates without tuning parameters or first-step optimization. The resulting estimator is proven consistent and asymptotically normal. Such a procedure matters for applied work because existing approaches either rely on stronger assumptions about attrition or require complex computation. Simulations show reliable finite-sample behavior and the method is illustrated on income data.

Core claim

Under the proposed alternative identifying assumption, the estimator obtained by a transformation of the empirical cumulative distribution function is consistent and asymptotically normal and requires neither tuning parameters nor optimization in the first step.

What carries the argument

The transformation of the empirical cumulative distribution function, justified by the alternative identifying assumption that restores identification while permitting nontrivial attrition.

If this is right

The estimator is consistent and asymptotically normal under the maintained assumption.
Estimation avoids tuning parameters and numerical optimization in the initial step.
Finite-sample performance is reliable according to the reported Monte Carlo experiments.
The procedure can be applied directly to empirical panel data such as income observations from refreshment-augmented surveys.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The closed-form nature may reduce barriers to routine use of refreshment samples in applied longitudinal analysis.
Because the first step is non-iterative, the method could be combined with standard second-step estimators for additional parameters without compounding computational cost.
The same CDF transformation idea might be examined for related missing-data problems such as item nonresponse in cross-sections.

Load-bearing premise

The alternative identifying assumption restores full identification while still allowing nontrivial attrition mechanisms.

What would settle it

Apply the estimator to a generated panel dataset in which the alternative identifying assumption is deliberately violated and check whether the estimates fail to converge to the true parameters.

read the original abstract

It has long been established that, if a panel dataset suffers from attrition, auxiliary (refreshment) sampling restores full identification under additional assumptions that still allow for nontrivial attrition mechanisms. Such identification results rely on implausible assumptions about the attrition process or lead to theoretically and computationally challenging estimation procedures. We propose an alternative identifying assumption that, despite its nonparametric nature, suggests a simple estimation algorithm based on a transformation of the empirical cumulative distribution function of the data. This estimation procedure requires neither tuning parameters nor optimization in the first step, i.e., it has a closed form. We prove that our estimator is consistent and asymptotically normal and demonstrate its good performance in simulations. We provide an empirical illustration with income data from the Understanding America Study.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a closed-form CDF-based estimator for panel attrition under a new nonparametric assumption that skips tuning and optimization.

read the letter

The one thing to take away is that this paper gives a closed-form estimator for panel attrition using refreshment samples. They come up with a nonparametric assumption that lets you get the estimator straight from transforming the empirical CDF, no tuning or optimization needed, and they prove consistency and asymptotic normality. What they do well is keep it simple. Earlier stuff either used assumptions that were hard to swallow or ended up with tricky estimation. This one avoids that and has simulations plus an application to income data from the Understanding America Study. The potential issue is whether that identifying assumption holds up in practice. It's nonparametric and meant to allow real attrition, but it still needs to be believable for the data at hand. The abstract doesn't show the full proof details, so the regularity conditions might need a close look in the paper. This is for people doing applied work with longitudinal surveys that have dropout. If the assumption fits their setting, the method could be handy because it's easy to implement. It looks like solid enough thinking to send to referees.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes an alternative nonparametric identifying assumption for panel data subject to attrition when refreshment samples are available. This assumption restores identification while permitting nontrivial attrition and yields a closed-form estimator obtained by a direct transformation of the empirical CDF; the authors establish consistency and asymptotic normality, report simulation evidence of good finite-sample performance, and illustrate the method with income data from the Understanding America Study.

Significance. If the identifying assumption is maintained and the consistency proof holds, the closed-form estimator constitutes a computationally attractive alternative to existing procedures that require optimization or tuning parameters. The absence of first-step estimation and tuning parameters, together with the explicit asymptotic normality result, would be a practical contribution for applied work in econometrics.

major comments (2)

[§3] §3, Assumption 3: the paper invokes a new nonparametric restriction on the joint distribution of (Y,T,R) that is claimed to be distinct from prior attrition literature; however, the text does not provide a formal proof that this restriction is strictly weaker than the assumptions in the cited refreshment-sample papers while still delivering point identification of the target parameters.
[Theorem 1] Theorem 1 (consistency): the derivation relies on the empirical CDF converging uniformly to the population CDF under the new assumption, but the argument does not explicitly address whether the refreshment-sample size must grow at the same rate as the main panel or whether additional regularity conditions on the support of the outcome are required.

minor comments (2)

[Table 1] Table 1: the simulation design reports bias and RMSE but does not include coverage probabilities for the asymptotic confidence intervals whose validity is claimed in Theorem 2.
[§5] The empirical illustration in §5 would benefit from a brief comparison of point estimates and standard errors obtained under the new assumption versus a standard complete-case analysis.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the insightful comments, which have helped us improve the clarity of our manuscript. We address each major comment below and will revise the paper accordingly.

read point-by-point responses

Referee: §3, Assumption 3: the paper invokes a new nonparametric restriction on the joint distribution of (Y,T,R) that is claimed to be distinct from prior attrition literature; however, the text does not provide a formal proof that this restriction is strictly weaker than the assumptions in the cited refreshment-sample papers while still delivering point identification of the target parameters.

Authors: We agree that a more formal comparison would strengthen the paper. Assumption 3 is designed to be a distinct nonparametric condition that enables closed-form estimation while maintaining point identification. In the revised manuscript, we will include an additional proposition that formally relates Assumption 3 to the assumptions in the cited refreshment-sample papers, demonstrating that it is strictly weaker in relevant cases and still yields point identification of the parameters of interest. This will be added to Section 3. revision: yes
Referee: Theorem 1 (consistency): the derivation relies on the empirical CDF converging uniformly to the population CDF under the new assumption, but the argument does not explicitly address whether the refreshment-sample size must grow at the same rate as the main panel or whether additional regularity conditions on the support of the outcome are required.

Authors: Thank you for pointing this out. The proof of Theorem 1 relies on the Glivenko-Cantelli theorem for uniform convergence, which holds under standard conditions. However, to make the asymptotic framework explicit, we will revise the statement of Theorem 1 to specify that the refreshment sample size grows at the same rate as the main panel (i.e., n_r / n -> c > 0) and add regularity conditions on the support of Y being compact or satisfying appropriate moment conditions. These clarifications will be incorporated into the revised version without altering the main results. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained under new assumption

full rationale

The paper introduces an alternative nonparametric identifying assumption distinct from prior attrition literature. This assumption directly motivates a closed-form estimator obtained by transforming the empirical CDF, with consistency and asymptotic normality proved thereafter. No step reduces the estimator to a fitted quantity defined by the assumption itself, no self-citation chain bears the central load, and the procedure requires no tuning parameters or optimization. The structure is internally consistent without the estimator being equivalent to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the validity of one domain-specific identifying assumption that enables the CDF transformation; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption Alternative nonparametric identifying assumption that restores full identification with refreshment samples while permitting nontrivial attrition
Invoked to justify the closed-form estimator and its consistency

pith-pipeline@v0.9.0 · 5649 in / 1111 out tokens · 30491 ms · 2026-05-23T19:10:39.232753+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages

[1]

Bhattacharya, D. (2008). Inference in panel data models under attrition caused by unobservables. Journal of Econometrics , 144(2):430--446

work page 2008
[2]

and Li, T

Callaway, B. and Li, T. (2019). Quantile treatment effects in difference in differences models with panel data. Quantitative Economics , 10(4):1579--1618

work page 2019
[3]

and MacKinnon, J

Davidson, R. and MacKinnon, J. G. (2000). Improving the reliability of bootstrap tests. Technical report, Queen's Economics Department Working Paper

work page 2000
[4]

S., Reiter, J

Deng, Y., Hillygus, D. S., Reiter, J. P., Si, Y., and Zheng, S. (2013). Handling attrition in longitudinal studies: The case for refreshment samples. Statistical Science , 28(2):238--256

work page 2013
[5]

d’Haultfoeuille, X. (2010). A new instrumental method for dealing with endogenous selection. Journal of Econometrics , 154(1):1--15

work page 2010
[6]

Franguridi, G., Hahn, J., Hoonhout, P., Kapteyn, A., and Ridder, G. (2024a). Raking for panels with nonignorable attrition and refreshment. Working paper

work page
[7]

Franguridi, G., Hahn, J., and Ridder, G. (2024b). Robust estimation and inference for panels with nonignorable attrition and refreshment. Working paper

work page
[8]

N., and White, H

Giacomini, R., Politis, D. N., and White, H. (2013). A warp-speed method for conducting monte carlo experiments involving bootstrap estimators. Econometric theory , 29(3):567--589

work page 2013
[9]

Hellerstein, J. K. and Imbens, G. W. (1999). Imposing moment restrictions from auxiliary data by weighting. Review of Economics and Statistics , 81(1):1--14

work page 1999
[10]

W., Ridder, G., and Rubin, D

Hirano, K., Imbens, G. W., Ridder, G., and Rubin, D. B. (2001). Combining panel data sets with attrition and refreshment samples. Econometrica , 69(6):1645--1659

work page 2001
[11]

and Ridder, G

Hoonhout, P. and Ridder, G. (2019). Nonignorable attrition in multi-period panels with refreshment samples. Journal of Business & Economic Statistics , 37(3):377--390

work page 2019
[12]

Lang, S. (2012). Fundamentals of differential geometry , volume 191. Springer Science & Business Media

work page 2012
[13]

Nevo, A. (2003). Using weights to adjust for sample selection when auxiliary information is available. Journal of Business & Economic Statistics , 21(1):43--52

work page 2003
[14]

Newey, W. K. and McFadden, D. (1994). Large sample estimation and hypothesis testing. Handbook of econometrics , 4:2111--2245

work page 1994
[15]

and Reiter, J

Sadinle, M. and Reiter, J. P. (2019). Sequentially additive nonignorable missing data modelling using auxiliary marginal information. Biometrika , 106(4):889--911

work page 2019
[16]

P., and Hillygus, D

Si, Y., Reiter, J. P., and Hillygus, D. S. (2015). Semi-parametric selection models for potentially non-ignorable attrition in panel studies with refreshment samples. Political Analysis , 23(1):92--112

work page 2015
[17]

Tauchen, G. (1985). Diagnostic testing and evaluation of maximum likelihood models. Journal of Econometrics , 30(1-2):415--443

work page 1985
[18]

K., Tong, X., and Maxwell, S

Taylor, L. K., Tong, X., and Maxwell, S. E. (2020). Evaluating supplemental samples in longitudinal research: Replacement and refreshment approaches. Multivariate Behavioral Research , 55(2):277--299

work page 2020
[19]

and Wellner, J

van der Vaart, A. and Wellner, J. (2023). Weak Convergence and Empirical Processes: With Applications to Statistics . Springer

work page 2023
[20]

Villani, C. et al. (2009). Optimal transport: old and new , volume 338. Springer

work page 2009
[21]

and Lynn, P

Watson, N. and Lynn, P. (2021). Refreshment sampling for longitudinal surveys. Advances in longitudinal survey methodology , pages 1--25

work page 2021
[22]

White, H. (2000). A reality check for data snooping. Econometrica , 68(5):1097--1126

work page 2000
[23]

Young, W. (1917). On multiple integration by parts and the second theorem of the mean. Proceedings of the London Mathematical Society , 2(1):273--293

work page 1917

[1] [1]

Bhattacharya, D. (2008). Inference in panel data models under attrition caused by unobservables. Journal of Econometrics , 144(2):430--446

work page 2008

[2] [2]

and Li, T

Callaway, B. and Li, T. (2019). Quantile treatment effects in difference in differences models with panel data. Quantitative Economics , 10(4):1579--1618

work page 2019

[3] [3]

and MacKinnon, J

Davidson, R. and MacKinnon, J. G. (2000). Improving the reliability of bootstrap tests. Technical report, Queen's Economics Department Working Paper

work page 2000

[4] [4]

S., Reiter, J

Deng, Y., Hillygus, D. S., Reiter, J. P., Si, Y., and Zheng, S. (2013). Handling attrition in longitudinal studies: The case for refreshment samples. Statistical Science , 28(2):238--256

work page 2013

[5] [5]

d’Haultfoeuille, X. (2010). A new instrumental method for dealing with endogenous selection. Journal of Econometrics , 154(1):1--15

work page 2010

[6] [6]

Franguridi, G., Hahn, J., Hoonhout, P., Kapteyn, A., and Ridder, G. (2024a). Raking for panels with nonignorable attrition and refreshment. Working paper

work page

[7] [7]

Franguridi, G., Hahn, J., and Ridder, G. (2024b). Robust estimation and inference for panels with nonignorable attrition and refreshment. Working paper

work page

[8] [8]

N., and White, H

Giacomini, R., Politis, D. N., and White, H. (2013). A warp-speed method for conducting monte carlo experiments involving bootstrap estimators. Econometric theory , 29(3):567--589

work page 2013

[9] [9]

Hellerstein, J. K. and Imbens, G. W. (1999). Imposing moment restrictions from auxiliary data by weighting. Review of Economics and Statistics , 81(1):1--14

work page 1999

[10] [10]

W., Ridder, G., and Rubin, D

Hirano, K., Imbens, G. W., Ridder, G., and Rubin, D. B. (2001). Combining panel data sets with attrition and refreshment samples. Econometrica , 69(6):1645--1659

work page 2001

[11] [11]

and Ridder, G

Hoonhout, P. and Ridder, G. (2019). Nonignorable attrition in multi-period panels with refreshment samples. Journal of Business & Economic Statistics , 37(3):377--390

work page 2019

[12] [12]

Lang, S. (2012). Fundamentals of differential geometry , volume 191. Springer Science & Business Media

work page 2012

[13] [13]

Nevo, A. (2003). Using weights to adjust for sample selection when auxiliary information is available. Journal of Business & Economic Statistics , 21(1):43--52

work page 2003

[14] [14]

Newey, W. K. and McFadden, D. (1994). Large sample estimation and hypothesis testing. Handbook of econometrics , 4:2111--2245

work page 1994

[15] [15]

and Reiter, J

Sadinle, M. and Reiter, J. P. (2019). Sequentially additive nonignorable missing data modelling using auxiliary marginal information. Biometrika , 106(4):889--911

work page 2019

[16] [16]

P., and Hillygus, D

Si, Y., Reiter, J. P., and Hillygus, D. S. (2015). Semi-parametric selection models for potentially non-ignorable attrition in panel studies with refreshment samples. Political Analysis , 23(1):92--112

work page 2015

[17] [17]

Tauchen, G. (1985). Diagnostic testing and evaluation of maximum likelihood models. Journal of Econometrics , 30(1-2):415--443

work page 1985

[18] [18]

K., Tong, X., and Maxwell, S

Taylor, L. K., Tong, X., and Maxwell, S. E. (2020). Evaluating supplemental samples in longitudinal research: Replacement and refreshment approaches. Multivariate Behavioral Research , 55(2):277--299

work page 2020

[19] [19]

and Wellner, J

van der Vaart, A. and Wellner, J. (2023). Weak Convergence and Empirical Processes: With Applications to Statistics . Springer

work page 2023

[20] [20]

Villani, C. et al. (2009). Optimal transport: old and new , volume 338. Springer

work page 2009

[21] [21]

and Lynn, P

Watson, N. and Lynn, P. (2021). Refreshment sampling for longitudinal surveys. Advances in longitudinal survey methodology , pages 1--25

work page 2021

[22] [22]

White, H. (2000). A reality check for data snooping. Econometrica , 68(5):1097--1126

work page 2000

[23] [23]

Young, W. (1917). On multiple integration by parts and the second theorem of the mean. Proceedings of the London Mathematical Society , 2(1):273--293

work page 1917