Endogenous Quantile Regression with Measurement Error in Dependent Variable
Pith reviewed 2026-05-21 02:42 UTC · model grok-4.3
The pith
A control function approach makes quantile regression coefficients identifiable despite endogenous regressors and measurement error in the dependent variable.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The conditional quantile coefficient functions, together with all other distributional parameters, are nonparametrically identifiable under a control function approach in a triangular system, despite additive measurement error in the dependent variable. This leads to a two-step sieve ML estimator that is consistent and asymptotically normal with suitable growth in quantile grid knots.
What carries the argument
Control function in the triangular system that captures endogeneity, used in a sieve likelihood maximization with copula weights for the generated control variable.
If this is right
- The method corrects for bias in quantile estimates caused by both endogeneity and measurement error.
- Inference can be conducted using bootstrap methods.
- The approach applies to a wide range of econometric settings with these data issues.
- Nonparametric identification holds for the full distribution, not just specific quantiles.
Where Pith is reading between the lines
- Extensions could include relaxing the additive measurement error assumption to multiplicative or other forms.
- This could be applied to panel data or other structures common in empirical economics.
- Testing the independence assumption between measurement error and control variable could be a useful robustness check.
Load-bearing premise
The data-generating process follows a triangular system in which the control function fully captures the endogeneity and the measurement error is additive and independent of the control variable conditional on the observed covariates.
What would settle it
Running the estimator on simulated data from the triangular system with known true quantile coefficients and checking if the estimates converge to the truth as the sample size grows large.
Figures
read the original abstract
This paper studies quantile regression with an endogenous regressor and measurement error in the dependent variable. Standard quantile regression estimators ignoring these two elements can induce substantial bias. We adopt a control-function approach in a triangular system and show that the conditional quantile coefficient functions, together with all other distributional parameters, are nonparametrically identifiable. Building on this constructive identification result, we propose a two-step sieve ML estimator. The first step estimates the control function. The second step performs a sieve likelihood maximization that incorporates the generated control variable through copula weights. When the number of quantile grid knots grows at an appropriate speed, the estimator is consistent and asymptotically normal, permitting inference via bootstrap. Monte Carlo simulations demonstrate that the estimator markedly reduces bias relative to existing methods, confirming its effectiveness in settings with endogeneity and additive measurement error in the outcome.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops a control-function approach for quantile regression in the presence of an endogenous regressor and additive measurement error in the outcome variable. It establishes nonparametric identification of the conditional quantile coefficient functions and all distributional parameters under a triangular system, then proposes a two-step sieve maximum-likelihood estimator in which the first step recovers the control function and the second step incorporates the generated control via copula weights. Consistency and asymptotic normality are claimed when the number of quantile-grid knots grows at a suitable rate, with bootstrap inference and Monte Carlo evidence of bias reduction relative to standard quantile regression.
Significance. If the identification result and the asymptotic theory hold under the stated assumptions, the paper would provide a useful addition to the literature on quantile regression with endogeneity and measurement error. The Monte Carlo simulations supply concrete evidence of bias reduction, and the constructive identification via the triangular system is a clear strength. The practical estimator could be applied in empirical settings where both endogeneity and outcome measurement error are plausible.
major comments (3)
- [§2] §2 (Identification): The nonparametric identification of the conditional quantile functions rests on the assumption that the measurement error u is independent of the control function conditional on the observed covariates (u ⊥ control | X). This conditional independence is load-bearing for the triangular-system argument; if it fails due to unobserved heterogeneity that correlates the measurement error with the first-stage residual, the generated-regressor correction does not recover the target parameters and both consistency and bootstrap validity break down.
- [§4] §4 (Asymptotics): The statement that the estimator is asymptotically normal when the number of quantile-grid knots grows at an appropriate speed is central to the inference claim, yet the precise rate condition and the handling of the generated-regressor estimation error in the sieve likelihood are not fully detailed. Without explicit verification of these steps, it is difficult to confirm that the bootstrap remains valid under the knot-growth schedule.
- [§3] §3 (Estimator): The second-step sieve ML uses copula weights that embed the conditional independence assumption from the first step. Any finite-sample dependence introduced by the estimated control function could affect the likelihood maximization; the paper should clarify whether additional trimming or adjustment is required to preserve the asymptotic properties.
minor comments (2)
- [Abstract] The abstract refers to an 'appropriate speed' for knot growth; the exact rate condition should be stated explicitly in the main text or theorem statement for clarity.
- [Monte Carlo section] Monte Carlo results would benefit from reporting standard errors or confidence bands around the bias and RMSE figures to allow readers to assess the precision of the reported improvements.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below, indicating planned revisions where appropriate.
read point-by-point responses
-
Referee: [§2] §2 (Identification): The nonparametric identification of the conditional quantile functions rests on the assumption that the measurement error u is independent of the control function conditional on the observed covariates (u ⊥ control | X). This conditional independence is load-bearing for the triangular-system argument; if it fails due to unobserved heterogeneity that correlates the measurement error with the first-stage residual, the generated-regressor correction does not recover the target parameters and both consistency and bootstrap validity break down.
Authors: We agree that the conditional independence assumption u ⊥ V | X (where V denotes the control function) is central to the nonparametric identification result in Section 2 and is stated explicitly as Assumption 3. This restriction ensures that the measurement error is independent of the endogenous component after conditioning on observables and the control. While we acknowledge that violations arising from additional unobserved heterogeneity could invalidate the generated-regressor correction, the assumption is standard in triangular control-function models and is maintained throughout the analysis. In the revision we will expand the discussion in Section 2 to provide further intuition, discuss plausible empirical settings (e.g., wage equations with reporting error), and note the consequences of potential violations. revision: partial
-
Referee: [§4] §4 (Asymptotics): The statement that the estimator is asymptotically normal when the number of quantile-grid knots grows at an appropriate speed is central to the inference claim, yet the precise rate condition and the handling of the generated-regressor estimation error in the sieve likelihood are not fully detailed. Without explicit verification of these steps, it is difficult to confirm that the bootstrap remains valid under the knot-growth schedule.
Authors: The referee correctly notes that the rate conditions and the propagation of first-step estimation error into the second-step sieve likelihood require more explicit treatment. Theorem 4 states consistency and asymptotic normality when the number of knots K_n satisfies K_n = o(n^{1/3}) together with standard sieve regularity conditions, but the appendix sketch is concise on bounding the generated-regressor term. We will revise Section 4 and the appendix to supply a fuller asymptotic expansion that isolates the contribution of the estimated control function, verify that this term is asymptotically negligible under the stated rate, and confirm bootstrap validity via the continuous-mapping theorem applied to the sieve estimator. revision: yes
-
Referee: [§3] §3 (Estimator): The second-step sieve ML uses copula weights that embed the conditional independence assumption from the first step. Any finite-sample dependence introduced by the estimated control function could affect the likelihood maximization; the paper should clarify whether additional trimming or adjustment is required to preserve the asymptotic properties.
Authors: The copula weights are constructed from the conditional distribution implied by the maintained independence assumption. Although using an estimated control function introduces finite-sample dependence, the asymptotic theory shows that this dependence vanishes at the required rate when the knot number grows appropriately. Our Monte Carlo experiments in Section 5 exhibit stable performance without extra trimming. In the revision we will add a clarifying remark in Section 3 stating that no additional trimming or adjustment is required to preserve the asymptotic properties, with the bootstrap procedure already incorporating the two-step estimation uncertainty. revision: partial
Circularity Check
No circularity: identification and estimation are derived from model assumptions and standard sieve theory
full rationale
The paper derives nonparametric identification of the conditional quantile coefficient functions from the triangular system and control-function assumptions, including additive measurement error independent of the control variable conditional on covariates. The two-step sieve ML estimator is constructed by first estimating the control function and then maximizing a sieve likelihood that incorporates it via copula weights. Consistency and asymptotic normality are established under a knot-growth rate condition using standard arguments from sieve estimation theory. No step reduces the target parameters to fitted values by construction, no self-citation is load-bearing for uniqueness or identification, and no ansatz or renaming is smuggled in. The derivation is self-contained against external benchmarks in econometric identification and nonparametric estimation.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Triangular system structure with control function capturing endogeneity
- domain assumption Additive measurement error independent of the control variable conditional on covariates
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We adopt a control-function approach in a triangular system and show that the conditional quantile coefficient functions, together with all other distributional parameters, are nonparametrically identifiable. ... FY|X,V(y|x,v) = ∫ Fε(y − x′β0(u)) FU|V(du|v)
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
the estimator is consistent and asymptotically normal when the number of quantile grid knots grows at an appropriate speed
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Abrevaya, J. and J. A. Hausman (1999). Semiparametric estimation with mismeasured dependent variables: an application to duration models for unemployment spells. Annales d'Economie et de Statistique\/ , 243--275
work page 1999
-
[2]
Blundell, R., X. Chen, and D. Kristensen (2007). Semi-nonparametric iv estimation of shape-invariant engel curves. Econometrica\/ 75\/ (6), 1613--1669
work page 2007
-
[3]
Blundell, R. and J. L. Powell (2007). Censored regression quantiles with endogenous regressors. Journal of Econometrics\/ 141\/ (1), 65--83
work page 2007
-
[4]
Bound, J., C. Brown, and N. Mathiowetz (2001). Measurement error in survey data. In Handbook of econometrics , Volume 5, pp.\ 3705--3843. Elsevier
work page 2001
-
[5]
Bound, J. and A. B. Krueger (1991). The extent of measurement error in longitudinal earnings data: Do two wrongs make a right? Journal of labor economics\/ 9\/ (1), 1--24
work page 1991
-
[6]
Bound, J., M. Schoenbaum, T. R. Stinebrickner, and T. Waidmann (1999). The dynamic effects of health on the labor force transitions of older workers. Labour economics\/ 6\/ (2), 179--202
work page 1999
-
[7]
Burda, M., M. Harding, and J. Hausman (2008). A bayesian mixed logit--probit model for multinomial choice. Journal of econometrics\/ 147\/ (2), 232--246
work page 2008
-
[8]
Burda, M., M. Harding, and J. Hausman (2012). A poisson mixture model of discrete choice. Journal of econometrics\/ 166\/ (2), 184--203
work page 2012
-
[9]
Callaway, B., A. Goodman-Bacon, and P. H. Sant'Anna (2024). Difference-in-differences with a continuous treatment. Technical report, National Bureau of Economic Research
work page 2024
- [10]
-
[11]
Chen, X. (2007). Large sample sieve estimation of semi-nonparametric models. Handbook of econometrics\/ 6 , 5549--5632
work page 2007
-
[12]
Chen, X., Y. Fan, and V. Tsyrennikov (2006). Efficient estimation of semiparametric multivariate copula models. Journal of the American Statistical Association\/ 101\/ (475), 1228--1240
work page 2006
-
[13]
Chen, X., O. Linton, and I. Van Keilegom (2003). Estimation of semiparametric models when the criterion function is not smooth. Econometrica\/ 71\/ (5), 1591--1608
work page 2003
-
[14]
Chen, X. and X. Shen (1998). Sieve extremum estimates for weakly dependent data. Econometrica\/ , 289--314
work page 1998
-
[15]
Chernozhukov, V., I. Fernandez-Val, and A. Galichon (2009). Improving point and interval estimators of monotone functions by rearrangement. Biometrika\/ 96\/ (3), 559--575
work page 2009
-
[16]
Chernozhukov, V. and C. Hansen (2005). An iv model of quantile treatment effects. Econometrica\/ 73\/ (1), 245--261
work page 2005
-
[17]
Chesher, A. (2017). Understanding the effect of measurement error on quantile regressions. Journal of Econometrics\/ 200\/ (2), 223--237
work page 2017
-
[18]
Cosslett, S. R. (2004). Efficient semiparametric estimation of censored and truncated regressions via a smoothed self-consistency equation. Econometrica\/ 72\/ (4), 1277--1293
work page 2004
-
[19]
Doty, J. and S. Song (2023). Nonparametric identification and estimation of quantile production functions. PDF available on Google Drive. Last accessed: 2026-05-16
work page 2023
-
[20]
D’Haultf uille, X., S. Hoderlein, and Y. Sasaki (2023). Nonparametric difference-in-differences in repeated cross-sections with continuous treatments. Journal of Econometrics\/ 234\/ (2), 664--690
work page 2023
-
[21]
Fan, J. (1991). On the optimal rates of convergence for nonparametric deconvolution problems. The Annals of Statistics\/ , 1257--1272
work page 1991
-
[22]
Firpo, S., A. F. Galvao, and S. Song (2017). Measurement errors in quantile regression models. Journal of econometrics\/ 198\/ (1), 146--164
work page 2017
-
[23]
Hahn, J. and G. Ridder (2013). Asymptotic variance of semiparametric estimators with generated regressors. Econometrica\/ 81\/ (1), 315--340
work page 2013
-
[24]
Han, S. and E. J. Vytlacil (2017). Identification in a generalization of bivariate probit models with dummy endogenous regressors. Journal of Econometrics\/ 199\/ (1), 63--73
work page 2017
-
[25]
Hausman, J. (2001). Mismeasured variables in econometric analysis: problems from the right and problems from the left. Journal of Economic perspectives\/ 15\/ (4), 57--67
work page 2001
-
[26]
Hausman, J., H. Liu, Y. Luo, and C. Palmer (2021). Errors in the dependent variable of quantile regression models. Econometrica\/ 89\/ (2), 849--873
work page 2021
-
[27]
Hausman, J. A., J. Abrevaya, and F. M. Scott-Morton (1998). Misclassification of the dependent variable in a discrete-response setting. Journal of econometrics\/ 87\/ (2), 239--269
work page 1998
-
[28]
Imbens, G. W. and W. K. Newey (2009). Identification and estimation of triangular simultaneous equations models without additivity. Econometrica\/ 77\/ (5), 1481--1512
work page 2009
- [29]
-
[30]
Koenker, R. (2005). Quantile regression , Volume 38. Cambridge university press
work page 2005
-
[31]
Koenker, R. and G. Bassett Jr (1978). Regression quantiles. Econometrica: journal of the Econometric Society\/ , 33--50
work page 1978
-
[32]
Lee, S. (2007). Endogeneity in quantile regression models: A control function approach. Journal of Econometrics\/ 141\/ (2), 1131--1158
work page 2007
-
[33]
Lehmann, E. L. and H. J. D'Abrera (2006). Nonparametrics: statistical methods based on ranks , Volume 464. Springer New York
work page 2006
-
[34]
Meyer, B. D., W. K. Mok, and J. X. Sullivan (2009). The under-reporting of transfers in household surveys: Its nature and consequences. Technical report, National Bureau of Economic Research
work page 2009
-
[35]
Newey, W. K. (1997). Convergence rates and asymptotic normality for series estimators. Journal of econometrics\/ 79\/ (1), 147--168
work page 1997
-
[36]
Newey, W. K. and D. McFadden (1994). Large sample estimation and hypothesis testing. Handbook of econometrics\/ 4 , 2111--2245
work page 1994
-
[37]
Newey, W. K., J. L. Powell, and F. Vella (1999). Nonparametric estimation of triangular simultaneous equations models. Econometrica\/ 67\/ (3), 565--603
work page 1999
-
[38]
Petrin, A. and K. Train (2010). A control function approach to endogeneity in consumer choice models. Journal of marketing research\/ 47\/ (1), 3--13
work page 2010
-
[39]
Pollard, D. (1989). Asymptotics via empirical processes. Statistical science\/ , 341--354
work page 1989
-
[40]
Qu, L. and Y. Lu (2021). Copula density estimation by finite mixture of parametric copula densities. Communications in statistics-simulation and computation\/ 50\/ (11), 3315--3337
work page 2021
-
[41]
Schennach, S. M. (2008). Quantile regression with mismeasured covariates. Econometric Theory\/ 24\/ (4), 1010--1043
work page 2008
-
[42]
Song, S. (2026). Identification and estimation of nonseparable triangular models with measurement error. PDF available on Google Drive. Last accessed: 2026-05-16
work page 2026
-
[43]
Wei, Y. and R. J. Carroll (2009). Quantile regression with measurement error. Journal of the American Statistical Association\/ 104\/ (487), 1129--1143
work page 2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.