arxiv: 2604.03171 · v2 · submitted 2026-04-03 · 💰 econ.EM

Recognition: no theorem link

Flexible Imputation of Incomplete Network Data

Ge Sun , Weisheng Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-13 18:37 UTC · model grok-4.3

classification 💰 econ.EM

keywords network imputationincomplete networksGMM estimationpeer effectsnonparametric imputationfixed effects regressionsampled network data

0 comments

The pith

A nonparametric imputation combines covariate projection with local two-way fixed-effects regression to recover missing network links and deliver consistent GMM estimators.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes filling in missing connections from sampled network data by first projecting the observed links onto available covariates and then running a local two-way fixed-effects regression. This procedure operates without parametric forms or low-rank assumptions and still accounts for both observed covariates and unobserved heterogeneity. The authors establish entrywise convergence rates for the completed matrix and prove that generalized method of moments estimators remain consistent when applied to the imputed network. They also obtain explicit convergence rates for the estimator in the linear-in-means peer-effects model. Simulations and an empirical illustration confirm that the approach produces accurate imputations and reliable downstream estimates.

Core claim

By projecting sampled network observations onto covariates and then applying a local two-way fixed-effects regression, the method nonparametrically recovers the missing links, achieves entrywise convergence of the imputed matrix, and ensures consistency of GMM estimators constructed from the completed data without requiring parametric assumptions or low-rank restrictions.

What carries the argument

The imputation step that projects the observed network onto covariates and follows with a local two-way fixed-effects regression to recover unobserved entries while absorbing heterogeneity.

If this is right

The imputed matrix converges entrywise at a rate established by the paper.
GMM estimators that use the imputed network remain consistent.
The estimator in the linear-in-means peer-effects model attains the derived convergence rate.
Simulations show accurate imputation and reliable performance in downstream analysis.
Application to real sampled networks produces estimates consistent with the method's theoretical guarantees.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the same imputation logic extends to directed or weighted networks, it could broaden the set of empirical studies that can use sampled data without bias.
The approach might be adapted to panel or dynamic network settings where missingness occurs over time.
Combining the imputed networks with other semiparametric estimators could further relax assumptions in peer-effects research.

Load-bearing premise

The sampling mechanism and covariate structure must permit the projection-plus-local-fixed-effects combination to recover missing links without creating asymptotic bias.

What would settle it

Finding a data-generating process or simulation design where the imputed matrix produces GMM estimates that systematically differ from the estimates obtained with the true complete network would falsify the consistency result.

Figures

Figures reproduced from arXiv: 2604.03171 by Ge Sun, Weisheng Zhang.

read the original abstract

Sampled network data are widely used in empirical research because collecting complete network information is costly. However, empirical analyses based on sampled networks may lead to biased estimators. We propose a nonparametric imputation method for sampled networks and show that empirical analyses based on imputed networks yield consistent estimates. Our approach imputes missing network links by combining a projection onto covariates with a local two-way fixed-effects regression. The method avoids parametric assumptions, does not rely on low-rank restrictions, and flexibly accommodates both observed covariates and unobserved heterogeneity. We establish entrywise convergence rates for the imputed matrix and prove the consistency of generalized method of moments (GMM) estimators based on imputed networks. We further derive the convergence rate of the corresponding estimator in the linear-in-means peer-effects model. Simulations show strong performance of our method both in terms of imputation accuracy and in downstream empirical analysis. We illustrate our method with an application to the microfinance network data of Banerjee et al. (2013).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper offers a nonparametric imputation method for incomplete networks via covariate projection plus local two-way fixed effects, claiming entrywise rates and GMM consistency, but the step from per-entry error to aggregate moments needs closer scrutiny.

read the letter

The main takeaway is that the authors propose imputing missing links in sampled networks by first projecting onto observed covariates and then fitting a local two-way fixed-effects regression. This avoids both parametric link functions and global low-rank assumptions, which is useful for economic networks where heterogeneity matters. They report entrywise convergence rates for the imputed matrix and consistency for GMM estimators that treat the completed network as observed, plus a rate result for the linear-in-means peer-effects model. Simulations indicate reasonable imputation accuracy and improved downstream performance, and the Banerjee et al. microfinance application shows the procedure can be run on real data without obvious breakdowns.

Referee Report

2 major / 2 minor

Summary. The paper proposes a nonparametric imputation method for sampled/incomplete network data that combines projection onto observed covariates with a local two-way fixed-effects regression. It claims to establish entrywise convergence rates for the imputed adjacency matrix, prove consistency of GMM estimators that use the imputed networks, and derive the convergence rate for the linear-in-means peer-effects estimator. The approach avoids parametric assumptions and low-rank restrictions; performance is illustrated via simulations and an application to the Banerjee et al. (2013) microfinance network.

Significance. If the theoretical claims hold, the method supplies a flexible, assumption-light tool for correcting bias in empirical network analyses that rely on sampled data. This is relevant for peer-effects, diffusion, and other network models in economics where complete network observation is costly. The combination of covariate projection and local FE is a practical innovation, though its asymptotic properties require careful verification.

major comments (2)

[Theoretical results (consistency proofs)] The abstract asserts entrywise convergence rates for the imputed matrix and consistency of GMM estimators based on imputed networks, but entrywise rates alone do not automatically deliver the uniform control needed for network aggregates (sums over neighbors, quadratic forms in the adjacency matrix) that enter typical GMM moment conditions. Additional arguments establishing o_p(1) convergence of these aggregates under the local two-way FE imputation are required.
[Assumptions and identification] The weakest assumption—that the sampling process and covariate structure permit recovery of missing links without asymptotic bias via the local two-way FE step—needs explicit conditions on bandwidth shrinkage, the correlation between missingness and unobserved heterogeneity outside covariate neighborhoods, and the locality of the FE regression. Without these, non-vanishing bias can remain in the imputed matrix and propagate into the GMM objective.

minor comments (2)

[Abstract] The abstract and introduction should state the precise convergence rates (e.g., the order in n and the number of observed links) rather than referring only to “entrywise convergence rates.”
[Simulations] Simulation designs should report the exact missingness mechanism and the dimension of the covariate space to allow readers to assess how well they match the maintained assumptions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address the two major points below and will revise the manuscript to incorporate additional theoretical arguments and explicit assumptions as suggested.

read point-by-point responses

Referee: [Theoretical results (consistency proofs)] The abstract asserts entrywise convergence rates for the imputed matrix and consistency of GMM estimators based on imputed networks, but entrywise rates alone do not automatically deliver the uniform control needed for network aggregates (sums over neighbors, quadratic forms in the adjacency matrix) that enter typical GMM moment conditions. Additional arguments establishing o_p(1) convergence of these aggregates under the local two-way FE imputation are required.

Authors: We agree that entrywise rates require supplementary arguments to control network aggregates in GMM moments. In the revised version we will add a dedicated lemma establishing o_p(1) convergence of neighbor sums and quadratic forms in the imputed adjacency matrix. The proof will combine the entrywise rate with network-specific concentration bounds that exploit the local two-way fixed-effects structure and the assumed sparsity of the network. revision: yes
Referee: [Assumptions and identification] The weakest assumption—that the sampling process and covariate structure permit recovery of missing links without asymptotic bias via the local two-way FE step—needs explicit conditions on bandwidth shrinkage, the correlation between missingness and unobserved heterogeneity outside covariate neighborhoods, and the locality of the FE regression. Without these, non-vanishing bias can remain in the imputed matrix and propagate into the GMM objective.

Authors: We will strengthen the assumption section by adding explicit conditions: (i) bandwidth shrinkage rates that balance bias and variance in the local regression, (ii) conditional independence of missingness from unobserved heterogeneity given covariates within local neighborhoods, and (iii) a precise definition of locality for the fixed-effects step. These additions will ensure the imputed matrix is asymptotically unbiased and that the bias does not affect the GMM objective. revision: yes

Circularity Check

0 steps flagged

No circularity: convergence rates and GMM consistency derived from nonparametric assumptions

full rationale

The paper's core claims rest on establishing entrywise convergence rates for the imputed matrix via a nonparametric combination of covariate projection and local two-way fixed-effects regression, followed by standard GMM consistency arguments under the stated sampling and covariate assumptions. No step reduces by construction to a fitted input renamed as prediction, a self-definitional equivalence, or a load-bearing self-citation chain; the derivations are self-contained asymptotic results that do not invoke prior author work to force uniqueness or smuggle ansatzes. External benchmarks (simulations and the Banerjee et al. application) are used only for illustration, not to close the theoretical loop.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard nonparametric regression and matrix estimation assumptions rather than new free parameters or invented entities.

axioms (1)

standard math Standard regularity conditions for nonparametric regression and entrywise matrix convergence rates hold.
Required to establish the stated convergence rates for the imputed matrix.

pith-pipeline@v0.9.0 · 5452 in / 1027 out tokens · 39088 ms · 2026-05-13T18:37:36.145809+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

5 extracted references · 5 canonical work pages

[1]

Representations for partially exchangeable arrays of random variables

Aldous, David J.1981. “Representations for partially exchangeable arrays of random variables.”Jour- nal of Multivariate Analysis, 11(4): 581–598. Arcones, Miguel A.1995. “A Bernstein-type inequality for U-statistics and U-processes.”Statistics & probability letters, 22(3): 239–247. Armstrong, Timothy B, and Michal Koles´ ar.2020. “Simple and honest confid...

work page arXiv 1981
[2]

Therefore, we have ˆai = P j′∈S2 ˆK(j) h,j′Aij′ P j′∈S2 ˆK(j) h,j′ , ˆbj = P i′∈S2 ˆK(i) h,i′Ai′j −P i′∈S2 ˆK(i) h,i′ˆai′ P i′∈S2 ˆK(i) h,i′ . For any ˜i∈ S 2, the first-order condition with respect toa ˜i is given by X j′∈S2 ˆK(j) h,j′A˜ij′ − X j′∈S2 ˆK(j) h,j′ˆa˜i − X j′∈S2 ˆK(j) h,j′ˆbj′ | {z } =0 + ˆK(j) h,jA˜ij − ˆK(j) h,jˆa˜i − ˆK(j) h,j ˆbj = 0 Thu...

work page 2019
[3]

12Here we drop the conditioning because it is straightforward verify that the analysis does not depend on specific realization of{ζ i}i∈Sc. 52 and therefore, P max i∈S c X i′∈S2 K(i) h,i′(ζi −ζ i′)′H(ij) i′ (ζi −ζ i′)− X i′∈S2 E(K(i) h,i′(ζi −ζ i′)′H(ij) i′ (ζi −ζ i′)| {ζ i}i∈S c) ≥t| {ζ i}i∈S c ! ≤2Nexp −t2 M2nhdζ+4 + 1 3 M t It follows that we can find ...

work page 1995
[4]

To prove (B.28), letg(ij) i′j′ denoteK (ij) h,i′j′(ζ ′ i −ζ ′ i′, ζ′ j −ζ ′ j′)H(ij) i′j′ (ζ ′ i −ζ ′ i′, ζ′ j −ζ ′ j′)′ for notational simplicity

54 to show that there exists constants 0< D 4 < D5 <∞such that 14 P  D4n2h2dζ ≤min i,j∈S c X i′,j′∈S2 1( ˆdii′ ≤h)1( ˆdjj ′ ≤h)≤max i,j∈S c X i′,j′∈S2 1( ˆdii′ ≤h)1( ˆdjj ′ ≤h)≤D 5n2h2dζ   ≤1−δn −1/2 ⇒P  D4n2h2dζ −1δN,n ≤min i,j∈S c X i′,j′∈S2 ( ˆK(ij) h,i′j′ −K (ij) h,i′j′) ≤max i,j∈S c X i′,j′∈S2 ( ˆK(ij) h,i′j′ −K (ij) h,i′j′) ≤D 5n2h2dζ −1δN,n ...

work page 1995
[5]

Combining the uniform convergence result, Assumption 5(iii), (v), (vi), and Theorem 2.1 in Newey and McFadden (1994), it follows that ˆα p − →α0

sup α ψ( ˆA, Ym, Wm, α)−µ(α) ≤sup α 1 M MX m=1 ψ( ˆA, Wm, Ym, α)− 1 M MX m=1 ψ(Pm, Wm, Ym, α) + sup α 1 M MX m=1 ψ(Pm, Wm, Ym, α)− 1 M MX m=1 ψ(Am, Wm, Ym, α) + sup α 1 M MX m=1 ψ(Am, Wm, Ym, α)−E(ψ(A m, Wm, Ym, α)) Note that, by Assumption 5(vii) and Assumption 5(viii), sup α 1 M MX m=1 ψ( ˆA, Wm, Ym, α)− 1 M MX m=1 ψ(Pm, Wm, Ym, α) ≤ 1 M MX m=1 L(Wm, Ym...

work page 1994