pith. sign in

arxiv: 2410.11263 · v4 · pith:RJNKPUWLnew · submitted 2024-10-15 · 💰 econ.EM

Closed-form estimation and inference for panels with attrition and refreshment samples

Pith reviewed 2026-05-23 19:10 UTC · model grok-4.3

classification 💰 econ.EM
keywords attritionpanel datarefreshment samplesclosed-form estimationempirical CDFidentificationasymptotic normality
0
0 comments X

The pith

An alternative identifying assumption permits closed-form consistent estimation for panels with attrition via empirical CDF transformation using refreshment samples.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses attrition in panel data by leveraging refreshment samples under a new nonparametric identifying assumption distinct from prior work. This assumption supports a direct transformation of the empirical cumulative distribution function to produce estimates without tuning parameters or first-step optimization. The resulting estimator is proven consistent and asymptotically normal. Such a procedure matters for applied work because existing approaches either rely on stronger assumptions about attrition or require complex computation. Simulations show reliable finite-sample behavior and the method is illustrated on income data.

Core claim

Under the proposed alternative identifying assumption, the estimator obtained by a transformation of the empirical cumulative distribution function is consistent and asymptotically normal and requires neither tuning parameters nor optimization in the first step.

What carries the argument

The transformation of the empirical cumulative distribution function, justified by the alternative identifying assumption that restores identification while permitting nontrivial attrition.

If this is right

  • The estimator is consistent and asymptotically normal under the maintained assumption.
  • Estimation avoids tuning parameters and numerical optimization in the initial step.
  • Finite-sample performance is reliable according to the reported Monte Carlo experiments.
  • The procedure can be applied directly to empirical panel data such as income observations from refreshment-augmented surveys.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The closed-form nature may reduce barriers to routine use of refreshment samples in applied longitudinal analysis.
  • Because the first step is non-iterative, the method could be combined with standard second-step estimators for additional parameters without compounding computational cost.
  • The same CDF transformation idea might be examined for related missing-data problems such as item nonresponse in cross-sections.

Load-bearing premise

The alternative identifying assumption restores full identification while still allowing nontrivial attrition mechanisms.

What would settle it

Apply the estimator to a generated panel dataset in which the alternative identifying assumption is deliberately violated and check whether the estimates fail to converge to the true parameters.

read the original abstract

It has long been established that, if a panel dataset suffers from attrition, auxiliary (refreshment) sampling restores full identification under additional assumptions that still allow for nontrivial attrition mechanisms. Such identification results rely on implausible assumptions about the attrition process or lead to theoretically and computationally challenging estimation procedures. We propose an alternative identifying assumption that, despite its nonparametric nature, suggests a simple estimation algorithm based on a transformation of the empirical cumulative distribution function of the data. This estimation procedure requires neither tuning parameters nor optimization in the first step, i.e., it has a closed form. We prove that our estimator is consistent and asymptotically normal and demonstrate its good performance in simulations. We provide an empirical illustration with income data from the Understanding America Study.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes an alternative nonparametric identifying assumption for panel data subject to attrition when refreshment samples are available. This assumption restores identification while permitting nontrivial attrition and yields a closed-form estimator obtained by a direct transformation of the empirical CDF; the authors establish consistency and asymptotic normality, report simulation evidence of good finite-sample performance, and illustrate the method with income data from the Understanding America Study.

Significance. If the identifying assumption is maintained and the consistency proof holds, the closed-form estimator constitutes a computationally attractive alternative to existing procedures that require optimization or tuning parameters. The absence of first-step estimation and tuning parameters, together with the explicit asymptotic normality result, would be a practical contribution for applied work in econometrics.

major comments (2)
  1. [§3] §3, Assumption 3: the paper invokes a new nonparametric restriction on the joint distribution of (Y,T,R) that is claimed to be distinct from prior attrition literature; however, the text does not provide a formal proof that this restriction is strictly weaker than the assumptions in the cited refreshment-sample papers while still delivering point identification of the target parameters.
  2. [Theorem 1] Theorem 1 (consistency): the derivation relies on the empirical CDF converging uniformly to the population CDF under the new assumption, but the argument does not explicitly address whether the refreshment-sample size must grow at the same rate as the main panel or whether additional regularity conditions on the support of the outcome are required.
minor comments (2)
  1. [Table 1] Table 1: the simulation design reports bias and RMSE but does not include coverage probabilities for the asymptotic confidence intervals whose validity is claimed in Theorem 2.
  2. [§5] The empirical illustration in §5 would benefit from a brief comparison of point estimates and standard errors obtained under the new assumption versus a standard complete-case analysis.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the insightful comments, which have helped us improve the clarity of our manuscript. We address each major comment below and will revise the paper accordingly.

read point-by-point responses
  1. Referee: §3, Assumption 3: the paper invokes a new nonparametric restriction on the joint distribution of (Y,T,R) that is claimed to be distinct from prior attrition literature; however, the text does not provide a formal proof that this restriction is strictly weaker than the assumptions in the cited refreshment-sample papers while still delivering point identification of the target parameters.

    Authors: We agree that a more formal comparison would strengthen the paper. Assumption 3 is designed to be a distinct nonparametric condition that enables closed-form estimation while maintaining point identification. In the revised manuscript, we will include an additional proposition that formally relates Assumption 3 to the assumptions in the cited refreshment-sample papers, demonstrating that it is strictly weaker in relevant cases and still yields point identification of the parameters of interest. This will be added to Section 3. revision: yes

  2. Referee: Theorem 1 (consistency): the derivation relies on the empirical CDF converging uniformly to the population CDF under the new assumption, but the argument does not explicitly address whether the refreshment-sample size must grow at the same rate as the main panel or whether additional regularity conditions on the support of the outcome are required.

    Authors: Thank you for pointing this out. The proof of Theorem 1 relies on the Glivenko-Cantelli theorem for uniform convergence, which holds under standard conditions. However, to make the asymptotic framework explicit, we will revise the statement of Theorem 1 to specify that the refreshment sample size grows at the same rate as the main panel (i.e., n_r / n -> c > 0) and add regularity conditions on the support of Y being compact or satisfying appropriate moment conditions. These clarifications will be incorporated into the revised version without altering the main results. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained under new assumption

full rationale

The paper introduces an alternative nonparametric identifying assumption distinct from prior attrition literature. This assumption directly motivates a closed-form estimator obtained by transforming the empirical CDF, with consistency and asymptotic normality proved thereafter. No step reduces the estimator to a fitted quantity defined by the assumption itself, no self-citation chain bears the central load, and the procedure requires no tuning parameters or optimization. The structure is internally consistent without the estimator being equivalent to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the validity of one domain-specific identifying assumption that enables the CDF transformation; no free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption Alternative nonparametric identifying assumption that restores full identification with refreshment samples while permitting nontrivial attrition
    Invoked to justify the closed-form estimator and its consistency

pith-pipeline@v0.9.0 · 5649 in / 1111 out tokens · 30491 ms · 2026-05-23T19:10:39.232753+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages

  1. [1]

    Bhattacharya, D. (2008). Inference in panel data models under attrition caused by unobservables. Journal of Econometrics , 144(2):430--446

  2. [2]

    and Li, T

    Callaway, B. and Li, T. (2019). Quantile treatment effects in difference in differences models with panel data. Quantitative Economics , 10(4):1579--1618

  3. [3]

    and MacKinnon, J

    Davidson, R. and MacKinnon, J. G. (2000). Improving the reliability of bootstrap tests. Technical report, Queen's Economics Department Working Paper

  4. [4]

    S., Reiter, J

    Deng, Y., Hillygus, D. S., Reiter, J. P., Si, Y., and Zheng, S. (2013). Handling attrition in longitudinal studies: The case for refreshment samples. Statistical Science , 28(2):238--256

  5. [5]

    d’Haultfoeuille, X. (2010). A new instrumental method for dealing with endogenous selection. Journal of Econometrics , 154(1):1--15

  6. [6]

    Franguridi, G., Hahn, J., Hoonhout, P., Kapteyn, A., and Ridder, G. (2024a). Raking for panels with nonignorable attrition and refreshment. Working paper

  7. [7]

    Franguridi, G., Hahn, J., and Ridder, G. (2024b). Robust estimation and inference for panels with nonignorable attrition and refreshment. Working paper

  8. [8]

    N., and White, H

    Giacomini, R., Politis, D. N., and White, H. (2013). A warp-speed method for conducting monte carlo experiments involving bootstrap estimators. Econometric theory , 29(3):567--589

  9. [9]

    Hellerstein, J. K. and Imbens, G. W. (1999). Imposing moment restrictions from auxiliary data by weighting. Review of Economics and Statistics , 81(1):1--14

  10. [10]

    W., Ridder, G., and Rubin, D

    Hirano, K., Imbens, G. W., Ridder, G., and Rubin, D. B. (2001). Combining panel data sets with attrition and refreshment samples. Econometrica , 69(6):1645--1659

  11. [11]

    and Ridder, G

    Hoonhout, P. and Ridder, G. (2019). Nonignorable attrition in multi-period panels with refreshment samples. Journal of Business & Economic Statistics , 37(3):377--390

  12. [12]

    Lang, S. (2012). Fundamentals of differential geometry , volume 191. Springer Science & Business Media

  13. [13]

    Nevo, A. (2003). Using weights to adjust for sample selection when auxiliary information is available. Journal of Business & Economic Statistics , 21(1):43--52

  14. [14]

    Newey, W. K. and McFadden, D. (1994). Large sample estimation and hypothesis testing. Handbook of econometrics , 4:2111--2245

  15. [15]

    and Reiter, J

    Sadinle, M. and Reiter, J. P. (2019). Sequentially additive nonignorable missing data modelling using auxiliary marginal information. Biometrika , 106(4):889--911

  16. [16]

    P., and Hillygus, D

    Si, Y., Reiter, J. P., and Hillygus, D. S. (2015). Semi-parametric selection models for potentially non-ignorable attrition in panel studies with refreshment samples. Political Analysis , 23(1):92--112

  17. [17]

    Tauchen, G. (1985). Diagnostic testing and evaluation of maximum likelihood models. Journal of Econometrics , 30(1-2):415--443

  18. [18]

    K., Tong, X., and Maxwell, S

    Taylor, L. K., Tong, X., and Maxwell, S. E. (2020). Evaluating supplemental samples in longitudinal research: Replacement and refreshment approaches. Multivariate Behavioral Research , 55(2):277--299

  19. [19]

    and Wellner, J

    van der Vaart, A. and Wellner, J. (2023). Weak Convergence and Empirical Processes: With Applications to Statistics . Springer

  20. [20]

    Villani, C. et al. (2009). Optimal transport: old and new , volume 338. Springer

  21. [21]

    and Lynn, P

    Watson, N. and Lynn, P. (2021). Refreshment sampling for longitudinal surveys. Advances in longitudinal survey methodology , pages 1--25

  22. [22]

    White, H. (2000). A reality check for data snooping. Econometrica , 68(5):1097--1126

  23. [23]

    Young, W. (1917). On multiple integration by parts and the second theorem of the mean. Proceedings of the London Mathematical Society , 2(1):273--293