Learning a directed acyclic graph with additive heteroscedastic errors
Pith reviewed 2026-06-29 16:13 UTC · model grok-4.3
The pith
Heteroscedastic errors identify DAG directions via quantile-invariant scales
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under a structural equation model with additive heteroscedastic errors, the conditional scale coefficients remain invariant across quantiles. This invariance permits the RESQUE procedure to identify sink nodes iteratively via residual construction and composite quantile regression, recovering both the topological order and the full graph structure, with theoretical consistency even when the number of variables diverges with the sample size.
What carries the argument
The invariance of conditional scale coefficients across quantiles in the location-scale noise model, used by the RESQUE iterative procedure to recursively identify sink nodes.
Load-bearing premise
Conditional scale coefficients remain unchanged regardless of the quantile level examined.
What would settle it
Generate data from a known DAG under additive heteroscedastic errors but with scale coefficients that deliberately vary across quantiles; the procedure should then fail to recover the correct topological order.
Figures
read the original abstract
This paper studies causal discovery for a directed acyclic graph under a structural equation model with additive heteroscedastic errors. We first establish new identifiability results for location-scale noise models, showing that heteroscedasticity can be leveraged to recover causal directions. Based on these insights, we propose a novel iterative procedure, Residual Simultaneous Quantile Estimation (RESQUE), where each iteration consists of a residual-construction stage and a composite quantile regression stage, enabling recursive identification of sink nodes via the invariance of conditional scale coefficients across quantiles. We then establish its theoretical guarantees for recovering topological order and graph structure, even when the number of variables diverges with the sample size. Simulation studies and application to benchmark datasets show that RESQUE performs favorably compared with existing methods, especially when causal information is partly encoded in the variance component. These results highlight exploiting structured variance signals for causal discovery and provide a principled framework for multivariate causal discovery beyond mean-based modeling.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper studies causal discovery for DAGs under a structural equation model with additive heteroscedastic errors. It establishes new identifiability results showing that heteroscedasticity can be leveraged to recover causal directions via quantile-invariant conditional scale coefficients. It proposes the RESQUE iterative procedure (residual construction followed by composite quantile regression) for recursive sink-node identification, proves theoretical guarantees for topological order and graph recovery even when p diverges with n, and reports favorable simulation and benchmark performance relative to existing methods when variance components carry causal information.
Significance. If the identifiability and high-dimensional consistency results hold, the work provides a principled extension of causal discovery beyond mean-based modeling by exploiting structured variance signals. This is potentially valuable in domains where heteroscedasticity encodes directional information, and the allowance for diverging p broadens applicability.
major comments (2)
- [Abstract] Abstract (identifiability paragraph): the central claim that 'heteroscedasticity can be leveraged to recover causal directions' rests on the location-scale model with quantile-invariant conditional scales; without the explicit theorem statement, assumptions, and proof, it is impossible to verify whether the invariance holds generically or only under additional restrictions that may not be stated.
- [Abstract] Abstract (theoretical guarantees paragraph): the claim of 'theoretical guarantees for recovering topological order and graph structure, even when the number of variables diverges with the sample size' is load-bearing for the paper's contribution; the provided text gives no derivation, rate conditions, or proof sketch, leaving the soundness of the high-dimensional result unverified.
minor comments (1)
- The abstract mentions simulation studies and benchmark applications but provides no details on data exclusion rules, simulation design, or performance metrics; these should be expanded for reproducibility.
Simulated Author's Rebuttal
We thank the referee for their comments. We address the two major comments on the abstract below by directing to the corresponding formal results in the manuscript. The abstract is intended as a concise summary; the full statements, assumptions, and proofs appear in the body of the paper.
read point-by-point responses
-
Referee: [Abstract] Abstract (identifiability paragraph): the central claim that 'heteroscedasticity can be leveraged to recover causal directions' rests on the location-scale model with quantile-invariant conditional scales; without the explicit theorem statement, assumptions, and proof, it is impossible to verify whether the invariance holds generically or only under additional restrictions that may not be stated.
Authors: The identifiability result is stated precisely as Theorem 3.1 in Section 3. Under the location-scale structural equation model and assumptions (A1)–(A3), the theorem establishes that the conditional scale coefficients are invariant across quantiles if and only if the corresponding edge is absent. The proof in Appendix A shows that the invariance property holds under these model assumptions without further restrictions. The abstract condenses this result; the explicit statement, assumptions, and proof are provided in the main text. revision: no
-
Referee: [Abstract] Abstract (theoretical guarantees paragraph): the claim of 'theoretical guarantees for recovering topological order and graph structure, even when the number of variables diverges with the sample size' is load-bearing for the paper's contribution; the provided text gives no derivation, rate conditions, or proof sketch, leaving the soundness of the high-dimensional result unverified.
Authors: The high-dimensional consistency results appear as Theorem 4.2 and Corollary 4.3 in Section 4. These establish recovery of the topological order and graph structure when p diverges with n, subject to explicit rate conditions (p = o(n^{1/3}) under sub-Gaussian tails). A proof sketch is given in the main text of Section 4, with the complete derivation in Appendix B. The abstract summarizes these guarantees; the rate conditions and proofs are contained in the manuscript. revision: no
Circularity Check
No significant circularity
full rationale
The abstract and context describe identifiability results derived from the location-scale structural equation model assumptions (additive heteroscedastic errors with quantile-invariant conditional scales) and the RESQUE iterative procedure for sink-node identification. No equations, proofs, or self-citations are supplied that reduce any claimed prediction or first-principles result to fitted inputs by construction. The theoretical guarantees for topological order recovery (even with diverging p) are presented as following from the model properties rather than from renaming or self-referential fitting. This is the expected self-contained case.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Data generated from structural equation model with additive heteroscedastic errors
Reference graph
Works this paper leans on
-
[1]
Adamczak, A
R. Adamczak, A. E. Litvak, A. Pajor, and N. Tomczak-Jaegermann. Restricted isom- etry property of matrices with independent columns and neighborly polytopes by ran- dom sampling.Constructive Approximation, 34:61–88, 2011
2011
-
[2]
Bello, B
K. Bello, B. Aragam, and P. Ravikumar. DAGMA: Learning DAGs via M-matrices and a Log-Determinant Acyclicity Characterization. InAdvances in Neural Information Processing Systems, 2022
2022
-
[3]
Belloni and V
A. Belloni and V. Chernozhukov. l1-penalized quantile regression in high-dimensional sparse models.The Annals of Statistics, 39(1):82 – 130, 2011
2011
-
[4]
Inference for High-Dimensional Sparse Econometric Models
A. Belloni, V. Chernozhukov, and C. Hansen. Inference for high-dimensional sparse econometric models.arXiv preprint arXiv:1201.0220, 2011
work page internal anchor Pith review Pith/arXiv arXiv 2011
-
[5]
Belloni, V
A. Belloni, V. Chernozhukov, and K. Kato. Valid post-selection inference in high- dimensional approximately sparse quantile regression models.Journal of the American Statistical Association, 114(526):749–758, 2019
2019
-
[6]
P. M. Bentler. Causal modeling via structural equation systems. InHandbook of multivariate experimental psychology, pages 317–335. Springer, 1988
1988
-
[7]
Berrevoets, J
J. Berrevoets, J. Raymaekers, M. Van der Schaar, T. Verdonck, and R. Yao. Dif- ferentiable causal structure learning with identifiability by notime. InProceedings of machine learning research, volume 258, pages 3115–3123. PMLR, 2025. 28
2025
-
[8]
P. J. Bickel, Y. Ritov, and A. B. Tsybakov. Simultaneous analysis of lasso and dantzig selector.The Annals of Statistics, 37(4):1705–1732, 2009
2009
-
[9]
Bl¨ obaum, D
P. Bl¨ obaum, D. Janzing, T. Washio, S. Shimizu, and B. Sch¨ olkopf. Cause-effect infer- ence by comparing regression errors. InInternational Conference on Artificial Intelli- gence and Statistics, pages 900–909. PMLR, 2018
2018
-
[10]
B¨ uhlmann, J
P. B¨ uhlmann, J. Peters, and J. Ernest. CAM: Causal additive models, high- dimensional order search and penalized regression.The Annals of Statistics, 42(6):2526 – 2556, 2014
2014
-
[11]
Chang, Z
T.-H. Chang, Z. Guo, and D. Malinsky. Post-selection inference for causal effects after causal discovery.Biometrika, 113(1):asaf073, 2026
2026
-
[12]
D. M. Chickering. Optimal structure identification with greedy search.Journal of machine learning research, 3(Nov):507–554, 2002
2002
-
[13]
Fan and R
J. Fan and R. Li. Variable selection via nonconcave penalized likelihood and its oracle properties.Journal of the American statistical Association, 96(456):1348–1360, 2001
2001
-
[14]
J. Fan, L. Xue, and H. Zou. Strong oracle optimality of folded concave penalized estimation.Annals of statistics, 42(3):819, 2014
2014
-
[15]
Friedman, T
J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the graphical lasso.Biostatistics, 9(3):432–441, 2008
2008
-
[16]
M. Gao, Y. Ding, and B. Aragam. A polynomial-time algorithm for learning nonpara- metric causal graphs.Advances in Neural Information Processing Systems, 33:11599– 11611, 2020
2020
-
[17]
Glymour, K
C. Glymour, K. Zhang, and P. Spirtes. Review of causal discovery methods based on graphical models.Frontiers in genetics, 10:524, 2019
2019
-
[18]
Gradu, T
P. Gradu, T. Zrnic, Y. Wang, and M. I. Jordan. Valid inference after causal discovery. Journal of the American Statistical Association, 120(550):1127–1138, 2025
2025
-
[19]
Harris and M
N. Harris and M. Drton. PC algorithm for nonparanormal graphical models.Journal of Machine Learning Research, 14(11), 2013
2013
-
[20]
X. He, X. Pan, K. M. Tan, and W.-X. Zhou. Smoothed quantile regression with large-scale inference.Journal of Econometrics, 2021
2021
-
[21]
Heinze-Deml, M
C. Heinze-Deml, M. H. Maathuis, and N. Meinshausen. Causal structure learning. Annual Review of Statistics and Its Application, 5:371–391, 2018
2018
-
[22]
Hoyer, D
P. Hoyer, D. Janzing, J. Mooij, J. Peters, and B. Sch¨ olkopf. Nonlinear causal dis- covery with additive noise models. InTwenty-Second Annual Conference on Neural Information Processing Systems (NIPS 2008), pages 689–696. Curran, 2009
2008
-
[23]
Immer, C
A. Immer, C. Schultheiss, J. E. Vogt, B. Sch¨ olkopf, P. B¨ uhlmann, and A. Marx. On the identifiability and estimation of causal location-scale noise models. InInternational Conference on Machine Learning, pages 14316–14332. PMLR, 2023. 29
2023
-
[24]
Kalisch and P
M. Kalisch and P. B¨ uhlman. Estimating high-dimensional directed acyclic graphs with the pc-algorithm.Journal of Machine Learning Research, 8(3), 2007
2007
-
[25]
Koenker and G
R. Koenker and G. Bassett, Jr. Regression quantiles.Econometrica: journal of the Econometric Society, pages 33–50, 1978
1978
-
[26]
C. Li, X. Shen, and W. Pan. Likelihood ratio tests for a large directed acyclic graph. Journal of the American Statistical Association, 2020
2020
-
[27]
C. Li, X. Shen, and W. Pan. Nonlinear causal discovery with confounders.Journal of the American Statistical Association, pages 1–10, 2023
2023
-
[28]
Y. Li, A. Torralba, A. Anandkumar, D. Fox, and A. Garg. Causal discovery in physical systems from videos.Advances in Neural Information Processing Systems, 33:9180– 9192, 2020
2020
-
[29]
Li and J
Y. Li and J. Zhu. L 1-norm quantile regression.Journal of Computational and Graph- ical Statistics, 17(1):163–185, 2008
2008
-
[30]
Y. Lin, Y. Huang, W. Liu, H. Deng, I. Ng, K. Zhang, M. Gong, Y. Ma, and B. Huang. A skewness-based criterion for addressing heteroscedastic noise in causal discovery. InInternational Conference on Learning Representations, volume 2025, pages 89283– 89310, 2025
2025
-
[31]
M. H. Maathuis, D. Colombo, M. Kalisch, and P. B¨ uhlmann. Predicting causal effects in large-scale systems from observational data.Nature methods, 7(4):247–248, 2010
2010
-
[32]
Meinshausen and B
N. Meinshausen and B. Yu. Lasso-type recovery of sparse representations for high- dimensional data.The Annals of Statistics, 37(1):246–270, 2009
2009
-
[33]
Mendelson, A
S. Mendelson, A. Pajor, and N. Tomczak-Jaegermann. Uniform uncertainty principle for bernoulli and subgaussian ensembles.Constructive Approximation, 28:277–289, 2008
2008
-
[34]
J. M. Mooij and T. Heskes. Cyclic causal discovery from continuous equilibrium data. InProceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, pages 431–439, 2013
2013
-
[35]
J. M. Mooij, J. Peters, D. Janzing, J. Zscheischler, and B. Sch¨ olkopf. Distinguishing cause from effect using observational data: methods and benchmarks.Journal of Machine Learning Research, 17(32):1–102, 2016
2016
-
[36]
G. Park. Identifiability of additive noise models using conditional variances.Journal of Machine Learning Research, 21(75):1–34, 2020
2020
-
[37]
Pearl.Causality
J. Pearl.Causality. Cambridge university press, 2009
2009
-
[38]
Peters and P
J. Peters and P. B¨ uhlmann. Identifiability of gaussian structural equation models with equal error variances.Biometrika, 101(1):219–228, 2014. 30
2014
-
[39]
Peters, D
J. Peters, D. Janzing, and B. Sch¨ olkopf. Identifying cause and effect on discrete data using additive noise models. InProceedings of the thirteenth international conference on artificial intelligence and statistics, pages 597–604. JMLR Workshop and Confer- ence Proceedings, 2010
2010
-
[40]
Peters, J
J. Peters, J. M. Mooij, D. Janzing, and B. Sch¨ olkopf. Causal discovery with continuous additive noise models.Journal of Machine Learning Research, 15:2009–2053, 2014
2009
-
[41]
Y. Qiu, J. Tao, and X.-H. Zhou. Inference of heterogeneous treatment effects using observational data with high-dimensional covariates.Journal of the Royal Statistical Society Series B: Statistical Methodology, 83(5):1016–1043, 2021
2021
-
[42]
Raskutti, M
G. Raskutti, M. J. Wainwright, and B. Yu. Restricted eigenvalue properties for cor- related gaussian designs.The Journal of Machine Learning Research, 11:2241–2259, 2010
2010
-
[43]
Sachs, O
K. Sachs, O. Perez, D. Pe’er, D. A. Lauffenburger, and G. P. Nolan. Causal protein-signaling networks derived from multiparameter single-cell data.Science, 308(5721):523–529, 2005
2005
-
[44]
Sch¨ olkopf, F
B. Sch¨ olkopf, F. Locatello, S. Bauer, N. R. Ke, N. Kalchbrenner, A. Goyal, and Y. Ben- gio. Toward causal representation learning.Proceedings of the IEEE, 109(5):612–634, 2021
2021
-
[45]
Schultheiss and P
C. Schultheiss and P. B¨ uhlmann. Ancestor regression in linear structural equation models.Biometrika, 110(4):1117–1124, 2023
2023
-
[46]
Schultheiss and P
C. Schultheiss and P. B¨ uhlmann. On the pitfalls of gaussian likelihood scoring for causal discovery.Journal of Causal Inference, 11(1):20220068, 2023
2023
-
[47]
Shimizu, P
S. Shimizu, P. O. Hoyer, A. Hyv¨ arinen, A. Kerminen, and M. Jordan. A linear non- Gaussian acyclic model for causal discovery.Journal of Machine Learning Research, 7(10), 2006
2006
-
[48]
P. Spirtes. An anytime algorithm for causal inference. InInternational Workshop on Artificial Intelligence and Statistics, pages 278–285. PMLR, 2001
2001
-
[49]
Spirtes and C
P. Spirtes and C. Glymour. An algorithm for fast recovery of sparse causal graphs. Social science computer review, 9(1):62–72, 1991
1991
-
[50]
Spirtes, C
P. Spirtes, C. N. Glymour, and R. Scheines.Causation, prediction, and search. MIT press, 2000
2000
-
[51]
E. V. Strobl and T. A. Lasko. Identifying patient-specific root causes with the het- eroscedastic noise model.Journal of Computational Science, 72:102099, 2023
2023
-
[52]
Sun and O
X. Sun and O. Schulte. Cause-effect inference in location-scale noise models: Maxi- mum likelihood vs. independence testing.Advances in Neural Information Processing Systems, 36:5447–5483, 2023. 31
2023
-
[53]
K. M. Tan, L. Wang, and W.-X. Zhou. High-dimensional quantile regression: Convo- lution smoothing and concave regularization.Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(1):205–233, 2022
2022
-
[54]
Q.-D. Tran, B. Duong, P. Nguyen, and T. Nguyen. Robust estimation of causal het- eroscedastic noise models. InProceedings of the 2024 SIAM International Conference on Data Mining (SDM), pages 788–796. SIAM, 2024
2024
-
[55]
Tsamardinos, L
I. Tsamardinos, L. E. Brown, and C. F. Aliferis. The max-min hill-climbing bayesian network structure learning algorithm.Machine learning, 65(1):31–78, 2006
2006
-
[56]
M. J. Vowels, N. C. Camgoz, and R. Bowden. D’ya like DAGs? a survey on structure learning and causal discovery.ACM Computing Surveys, 55(4):1–36, 2022
2022
-
[57]
Y. S. Wang, M. Kolar, and M. Drton. Confidence sets for causal orderings.Journal of the American Statistical Association, pages 1–14, 2025
2025
-
[58]
H. Wold. Causality and econometrics.Econometrica: Journal of the Econometric Society, pages 162–177, 1954
1954
-
[59]
S. Xu, O. A. Mian, A. Marx, and J. Vreeken. Inferring cause and effect in the presence of heteroscedastic noise. InInternational Conference on Machine Learning, pages 24615–24630. PMLR, 2022
2022
-
[60]
Y. Yang, S. Bom, and X. Shen. A hierarchical ensemble causal structure learning approach for wafer manufacturing.Journal of Intelligent Manufacturing, 35(6):2961– 2978, 2024
2024
-
[61]
Ye and C.-H
F. Ye and C.-H. Zhang. Rate minimaxity of the lasso and dantzig selector for the lq loss in lr balls.The Journal of Machine Learning Research, 11:3519–3540, 2010
2010
-
[62]
N. Yin, T. Gao, Y. Yu, and Q. Ji. Effective causal discovery under identifiable het- eroscedastic noise model. InProceedings of the AAAI Conference on Artificial Intel- ligence, volume 38, pages 16486–16494, 2024
2024
-
[63]
Y. Yuan, X. Shen, W. Pan, and Z. Wang. Constrained likelihood for reconstructing a directed acyclic gaussian graph.Biometrika, 106(1):109–125, 2019
2019
-
[64]
C.-H. Zhang. Nearly unbiased variable selection under minimax concave penalty.An- nals of statistics, 38(2):894–942, 2010
2010
-
[65]
Zhang and A
K. Zhang and A. Hyv¨ arinen. On the identifiability of the post-nonlinear causal model. InProceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pages 647–655, 2009
2009
-
[66]
Zhang and A
K. Zhang and A. Hyv¨ arinen. Distinguishing causes from effects using nonlinear acyclic causal models. InCausality: Objectives and Assessment, pages 157–164. PMLR, 2010
2010
-
[67]
Zhang, Y
T. Zhang, Y. Zhang, and T. Zhou. Statistical insights into HSIC in high dimensions. Advances in Neural Information Processing Systems, 36:19145–19156, 2023. 32
2023
-
[68]
Zhao and B
P. Zhao and B. Yu. On model selection consistency of lasso.The Journal of Machine Learning Research, 7:2541–2563, 2006
2006
-
[69]
R. Zhao, X. He, and J. Wang. Learning linear non-gaussian directed acyclic graph with diverging number of nodes.The Journal of Machine Learning Research, 23(1):12314– 12347, 2022
2022
-
[70]
Zheng, B
X. Zheng, B. Aragam, P. K. Ravikumar, and E. P. Xing. Dags with no tears: Contin- uous optimization for structure learning.Advances in neural information processing systems, 31, 2018
2018
-
[71]
Zhou and H
L. Zhou and H. Zou. Cross-fitted residual regression for high-dimensional heteroscedas- ticity pursuit.Journal of the American Statistical Association, 118(542):1056–1065, 2023
2023
-
[72]
Zou and M
H. Zou and M. Yuan. Composite quantile regression and the oracle model selection theory.Annals of Statistics, 36(3):1108–1126, 2008. 33
2008
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.