Recognition: unknown
Linear Regression for Panel With Unknown Number of Factors as Interactive Fixed Effects
Pith reviewed 2026-05-09 18:34 UTC · model grok-4.3
The pith
In panel regressions with interactive fixed effects, the limiting distribution of the least squares estimator stays the same as long as the number of factors included meets or exceeds the true number.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Assuming the number of factors used in estimation is larger than the true number, the limiting distribution of the LS estimator for the regression coefficients is independent of the number of factors used in the estimation, as the number of time periods and the number of cross-sectional units jointly go to infinity.
What carries the argument
The least squares estimator in a linear panel regression with interactive fixed effects, analyzed under over-specification of the factor dimension.
If this is right
- Inference on the slope parameters remains valid without consistent estimation of the number of factors.
- A researcher can safely fix a conservatively large number of factors and still obtain correct asymptotic standard errors.
- The usual bias-correction or bias-robust procedures for interactive fixed effects continue to apply when the factor count is over-specified.
- Model-selection criteria for the number of factors become unnecessary for coefficient inference.
Where Pith is reading between the lines
- The same invariance may hold for other estimators such as IV or quantile regression in similar factor-augmented panels.
- Over-specifying the factors could serve as a default robust strategy when the true dimension is uncertain.
- Monte Carlo experiments with varying over-specification levels would directly test the practical relevance of the invariance result.
- The finding suggests that factor-augmented panel methods are more forgiving of upward misspecification than downward misspecification.
Load-bearing premise
The number of factors included in estimation is at least as large as the true number present in the data.
What would settle it
Simulate panel data from a model with a known fixed number of factors, estimate the regression using several larger numbers of factors, and check whether the finite-sample distribution of the coefficient estimator converges to the same limiting normal law in each case as both dimensions grow.
Figures
read the original abstract
In this paper we study the least squares (LS) estimator in a linear panel regression model with unknown number of factors appearing as interactive fixed effects. Assuming that the number of factors used in estimation is larger than the true number of factors in the data, we establish the limiting distribution of the LS estimator for the regression coefficients as the number of time periods and the number of cross-sectional units jointly go to infinity. The main result of the paper is that under certain assumptions the limiting distribution of the LS estimator is independent of the number of factors used in the estimation, as long as this number is not underestimated. The important practical implication of this result is that for inference on the regression coefficients one does not necessarily need to estimate the number of interactive fixed effects consistently.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper studies the least squares estimator in a linear panel regression with interactive fixed effects when the number of factors is unknown but over-specified in estimation. Under joint asymptotics (N,T → ∞) and stated assumptions, it establishes that the limiting distribution of the estimator for the regression coefficients is the same for any fixed r̂ > r (true number of factors) and does not depend on the specific over-specified value of r̂. The practical implication is that consistent estimation of the number of factors is not required for valid inference on the slopes.
Significance. If the central result holds, it is significant for applied panel-data work with interactive fixed effects: researchers can avoid the often-difficult step of consistently selecting the number of factors while still obtaining asymptotically valid inference on β. This relaxes a common practical constraint and builds directly on the Bai (2009) and related interactive fixed-effects literature by addressing over-specification explicitly.
major comments (2)
- [Main theorem / asymptotic expansion] Main theorem (asymptotic distribution result): the claim that extra-factor estimation error is asymptotically negligible for the β estimator requires an explicit bound showing that the cross term between the (r̂ − r) superfluous factor estimates and the regressors (or idiosyncratic errors) is o_p(1/√(NT)). Standard regularity conditions alone do not automatically deliver this when the population factor covariance is rank-deficient in the extra dimensions; the paper should add or verify this step in the expansion.
- [Assumptions / regularity conditions] Assumptions on factor loadings and regressors (over-specification case): the joint convergence rate for the superfluous factors may be slower than the usual min(√N, √T) rate. The manuscript should state any additional conditions (e.g., on the minimal eigenvalue gap or moment bounds) that ensure the extra estimation error does not enter the leading term of the β expansion; without them the independence from r̂ is not guaranteed under the listed regularity conditions.
minor comments (2)
- Notation: consistently distinguish r (true) from r̂ (estimated/used) throughout the text and theorems to avoid reader confusion.
- [Abstract / Introduction] The abstract and introduction could briefly note that the result applies only to the slope coefficients and not necessarily to the factor or loading estimates themselves.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive suggestions. The comments highlight important aspects of the asymptotic expansion and regularity conditions in the over-specification case. We address each point below and will revise the manuscript to strengthen the presentation of the proofs.
read point-by-point responses
-
Referee: [Main theorem / asymptotic expansion] Main theorem (asymptotic distribution result): the claim that extra-factor estimation error is asymptotically negligible for the β estimator requires an explicit bound showing that the cross term between the (r̂ − r) superfluous factor estimates and the regressors (or idiosyncratic errors) is o_p(1/√(NT)). Standard regularity conditions alone do not automatically deliver this when the population factor covariance is rank-deficient in the extra dimensions; the paper should add or verify this step in the expansion.
Authors: We agree that making the bound explicit improves clarity. The proof of Theorem 3.1 proceeds via the expansion in equation (A.12) of the appendix, where the cross term is controlled by the orthogonality of the estimated superfluous factors to both the regressors and the true factors (using the projection onto the space spanned by the estimated factors). Under the fixed r̂ and the weak dependence and moment conditions in Assumptions 2.1–2.3, this term is shown to be o_p(1/√(NT)). To address the rank-deficiency concern directly, we will insert a new supporting lemma (Lemma A.4) that derives the required bound without relying on a positive eigenvalue gap in the extra dimensions, since those dimensions have zero population variance by the model definition. The revised manuscript will include this lemma. revision: yes
-
Referee: [Assumptions / regularity conditions] Assumptions on factor loadings and regressors (over-specification case): the joint convergence rate for the superfluous factors may be slower than the usual min(√N, √T) rate. The manuscript should state any additional conditions (e.g., on the minimal eigenvalue gap or moment bounds) that ensure the extra estimation error does not enter the leading term of the β expansion; without them the independence from r̂ is not guaranteed under the listed regularity conditions.
Authors: The setup states that r̂ is a fixed integer greater than the true r (see the paragraph preceding Theorem 3.1). Because r̂ is fixed, the slower rate in the superfluous directions does not affect the leading term for β̂; the extra estimation error is annihilated by the orthogonality built into the least-squares normal equations. The full-rank condition on the true factor loadings (Assumption 2.2) together with the moment bounds already ensures the cross term vanishes at the required rate. We will add a short remark immediately after Assumption 3.1 to spell out why no further eigenvalue-gap condition on the superfluous dimensions is needed. If the referee can point to a specific counter-example under our listed assumptions, we would be happy to examine it, but we believe the stated conditions suffice for the claimed independence from r̂. revision: partial
Circularity Check
Asymptotic derivation self-contained under stated assumptions
full rationale
The paper derives the limiting distribution of the LS estimator for regression coefficients when the number of interactive fixed effects is over-specified (r_hat > r). This is established via direct asymptotic expansion under joint N,T asymptotics and standard regularity conditions (bounded moments, no perfect multicollinearity). No step reduces by construction to a fitted parameter, self-defined quantity, or load-bearing self-citation chain; the independence from r_hat follows from showing superfluous factor estimation errors are asymptotically negligible in the leading term. The result is externally falsifiable via the stated assumptions and does not rename or smuggle in prior results as new derivations.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Number of factors used in estimation exceeds the true number
- domain assumption Joint asymptotics with N and T both diverging to infinity
Reference graph
Works this paper leans on
-
[1]
Ahn, S. C. and Horenstein, A. R. (2013). Eigenvalue ratio test for the number of factors. Econometrica , 81(3):1203--1227
2013
-
[2]
C., Lee, Y
Ahn, S. C., Lee, Y. H., and Schmidt, P. (2001). GMM estimation of linear panel data models with time-varying individual effects. Journal of Econometrics , 101(2):219--255
2001
-
[3]
C., Lee, Y
Ahn, S. C., Lee, Y. H., and Schmidt, P. (2013). Panel data models with multiple time-varying individual effects. Journal of Econometrics , 174(1):1--14
2013
-
[4]
Allen, D. W. (1992). Marriage and divorce: Comment. American Economic Review , pages 679--685
1992
-
[5]
Andrews, D. W. K. (1999). Estimation when a parameter is on a boundary. Econometrica , 67(6):1341--1384
1999
-
[6]
Bai, J. (2009a). Panel data models with interactive fixed effects . Econometrica , 77(4):1229--1279
-
[7]
Bai, J. (2009b). Supplement to ``Panel data models with interactive fixed effects'': technical details and proofs . Econometrica Supplemental Material , 77(4)
-
[8]
Bai, J. (2013). Likelihood approach to small T dynamic panel models with interactive effects . Manuscript
2013
-
[9]
and Ng, S
Bai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica , 70(1):191--221
2002
-
[10]
Bai, Z. (1993). Convergence rate of expected spectral distributions of large random matrices. Part II. Sample covariance matrices . The Annals of Probability , 21(2):649--672
1993
-
[11]
Bai, Z. (1999). Methodologies in spectral analysis of large dimensional random matrices, a review. Statistica Sinica , 9:611--677
1999
-
[12]
Bai, Z., Miao, B., and Yao, J. (2004). Convergence rates of spectral distributions of large sample covariance matrices . SIAM journal on matrix analysis and applications , 25(1):105--127
2004
-
[13]
D., Silverstein, J
Bai, Z. D., Silverstein, J. W., and Yin, Y. Q. (1988). A note on the largest eigenvalue of a large dimensional sample covariance matrix. J. Multivar. Anal. , 26(2):166--168
1988
-
[14]
H., and Tosetti, E
Chudik, A., Pesaran, M. H., and Tosetti, E. (2011). Weak and strong cross-section dependence and estimation of large panels. The Econometrics Journal , 14(1):C45--C90
2011
-
[15]
Cox, D. D. and Kim, T. Y. (1995). Moment bounds for mixing random variables useful in nonparametric function estimation. Stochastic processes and their applications , 56(1):151--158
1995
-
[16]
and Jochmans, K
Dhaene, G. and Jochmans, K. (2010). Split-panel jackknife estimation of fixed-effect models. Unpublished manuscript
2010
-
[17]
Friedberg, L. (1998). Did unilateral divorce raise divorce rates? evidence from panel data. Technical report, JSTOR
1998
-
[18]
Geman, S. (1980). A limit theorem for the norm of random matrices. Annals of Probability , 8(2):252--261
1980
-
[19]
and Tikhomirov, A
G \"o tze, F. and Tikhomirov, A. (2010). The Rate of Convergence of Spectra of Sample Covariance Matrices . Theory of Probability and its Applications , 54:129
2010
-
[20]
Gray, J. S. (1998). Divorce-law changes, household bargaining, and married women's labor supply. American Economic Review , pages 628--642
1998
-
[21]
Holtz-Eakin, D., Newey, W., and Rosen, H. S. (1988). Estimating vector autoregressions with panel data. Econometrica , 56(6):1371--95
1988
-
[22]
Johnstone, I. (2001). On the distribution of the largest eigenvalue in principal components analysis . Annals of Statistics , 29(2):295--327
2001
-
[23]
Kato, T. (1980). Perturbation Theory for Linear Operators . Springer-Verlag
1980
-
[24]
Kiefer, N. (1980). A time series-cross section model with fixed effects with an intertemporal factor structure . Unpublished manuscript, Department of Economics, Cornell University
1980
-
[25]
and Oka, T
Kim, D. and Oka, T. (2014). Divorce law reforms and divorce rates in the usa: An interactive fixed-effects approach. Journal of Applied Econometrics
2014
-
[26]
Latala, R. (2005). Some estimates of norms of random matrices. Proc. Amer. Math. Soc. , 133:1273--1282
2005
-
[27]
and Su, L
Lu, X. and Su, L. (2013). Shrinkage estimation of dynamic panel data models with interactive fixed effects. Technical report, Working paper, Hong Kong University of Science & Technology
2013
-
[28]
and Pastur, L
Mar c enko, V. and Pastur, L. (1967). Distribution of eigenvalues for some sets of random matrices . Sbornik: Mathematics , 1(4):457--483
1967
-
[29]
Moon, H., Shum, M., and Weidner, M. (2014). Interactive fixed effects in the blp random coefficients demand model. CeMMAP working paper series
2014
-
[30]
and Weidner, M
Moon, H. and Weidner, M. (2013). Dynamic Linear Panel Regression Models with Interactive Fixed Effects . CeMMAP working paper series
2013
-
[31]
Nickell, S. (1981). Biases in dynamic models with fixed effects. Econometrica , 49(6):1417--1426
1981
-
[32]
Onatski, A. (2010). Determining the number of factors from empirical distribution of eigenvalues. The Review of Economics and Statistics , 92(4):1004--1016
2010
-
[33]
Onatski, A. (2012). Asymptotics of the principal components estimator of large factor models with weakly influential factors. Journal of Econometrics , 168(2):244--258
2012
-
[34]
Onatski, A. (2013). Asymptotic Analysis of the Squared Estimation Error in Misspecified Factor Models . Manuscript
2013
-
[35]
Pesaran, M. H. (2006). Estimation and inference in large heterogeneous panels with a multifactor error structure. Econometrica , 74(4):967--1012
2006
-
[36]
Peters, H. E. (1986). Marriage and divorce: Informational constraints and private contracting. American Economic Review , 76(3):437--54
1986
-
[37]
Peters, H. E. (1992). Marriage and divorce: Reply. American Economic Review , pages 686--693
1992
-
[38]
Silverstein, J. (1990). Weak convergence of random functions defined by the eigenvectors of sample covariance matrices . The Annals of Probability , 18(3):1174--1194
1990
-
[39]
Silverstein, J. W. (1989). On the eigenvectors of large dimensional sample covariance matrices. J. Multivar. Anal. , 30(1):1--16
1989
-
[40]
Soshnikov, A. (2002). A note on universality of the distribution of the largest eigenvalues in certain sample covariance matrices . Journal of Statistical Physics , 108(5):1033--1056
2002
-
[41]
Stock, J. H. and Watson, M. W. (2002). Forecasting using principal components from a large number of predictors. Journal of the American Statistical Association , 97:1167--1179
2002
- [42]
-
[43]
Wolfers, J. (2006). Did unilateral divorce laws raise divorce rates? a reconciliation and new results. American Economic Review , pages 1802--1820
2006
-
[44]
Q., Bai, Z
Yin, Y. Q., Bai, Z. D., and Krishnaiah, P. (1988). On the limit of the largest eigenvalue of the large-dimensional sample covariance matrix. Probability Theory Related Fields , 78:509--521
1988
-
[45]
Zaffaroni, P. (2009). Generalized least squares estimation of panel with common shocks . Manuscript
2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.