Learning a directed acyclic graph with additive heteroscedastic errors

Chunlin Li; Li Chen; Xintao Xia; Yue Hu

arxiv: 2605.26515 · v1 · pith:5ZP52B7Gnew · submitted 2026-05-26 · 📊 stat.ME

Learning a directed acyclic graph with additive heteroscedastic errors

Xintao Xia , Li Chen , Yue Hu , Chunlin Li This is my paper

Pith reviewed 2026-06-29 16:13 UTC · model grok-4.3

classification 📊 stat.ME

keywords causal discoverydirected acyclic graphheteroscedastic errorsquantile regressionstructural equation modelidentifiabilitytopological order

0 comments

The pith

Heteroscedastic errors identify DAG directions via quantile-invariant scales

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes identifiability results for location-scale noise models on directed acyclic graphs, showing that heteroscedasticity supplies information to recover causal directions. It introduces the RESQUE procedure, an iterative method that constructs residuals and applies composite quantile regression to exploit the invariance of conditional scale coefficients across quantiles and thereby locate sink nodes recursively. A sympathetic reader would care because standard causal discovery often relies solely on mean relationships while ignoring structured variance signals that can resolve edge directions. The procedure carries consistency guarantees that continue to hold when the number of variables grows with the sample size. Simulations indicate stronger performance precisely when causal information resides in the variance component.

Core claim

Under a structural equation model with additive heteroscedastic errors, the conditional scale coefficients remain invariant across quantiles. This invariance permits the RESQUE procedure to identify sink nodes iteratively via residual construction and composite quantile regression, recovering both the topological order and the full graph structure, with theoretical consistency even when the number of variables diverges with the sample size.

What carries the argument

The invariance of conditional scale coefficients across quantiles in the location-scale noise model, used by the RESQUE iterative procedure to recursively identify sink nodes.

Load-bearing premise

Conditional scale coefficients remain unchanged regardless of the quantile level examined.

What would settle it

Generate data from a known DAG under additive heteroscedastic errors but with scale coefficients that deliberately vary across quantiles; the procedure should then fail to recover the correct topological order.

Figures

Figures reproduced from arXiv: 2605.26515 by Chunlin Li, Li Chen, Xintao Xia, Yue Hu.

**Figure 2.** Figure 2: Empirical performance of CAM, rank-PC, TL, NOTEARS, and the proposed [PITH_FULL_IMAGE:figures/full_fig_p023_2.png] view at source ↗

**Figure 3.** Figure 3: Empirical performance of CAM, rank-PC, TL, NOTEARS, and RESQUE in [PITH_FULL_IMAGE:figures/full_fig_p024_3.png] view at source ↗

**Figure 4.** Figure 4: Consensus graph and estimated graph by RESQUE using the Sachs dataset. [PITH_FULL_IMAGE:figures/full_fig_p026_4.png] view at source ↗

read the original abstract

This paper studies causal discovery for a directed acyclic graph under a structural equation model with additive heteroscedastic errors. We first establish new identifiability results for location-scale noise models, showing that heteroscedasticity can be leveraged to recover causal directions. Based on these insights, we propose a novel iterative procedure, Residual Simultaneous Quantile Estimation (RESQUE), where each iteration consists of a residual-construction stage and a composite quantile regression stage, enabling recursive identification of sink nodes via the invariance of conditional scale coefficients across quantiles. We then establish its theoretical guarantees for recovering topological order and graph structure, even when the number of variables diverges with the sample size. Simulation studies and application to benchmark datasets show that RESQUE performs favorably compared with existing methods, especially when causal information is partly encoded in the variance component. These results highlight exploiting structured variance signals for causal discovery and provide a principled framework for multivariate causal discovery beyond mean-based modeling.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows heteroscedasticity aids identifiability in DAG learning via quantile invariance in RESQUE.

read the letter

The key takeaway is that this paper shows heteroscedasticity can aid identifiability in causal discovery for additive error models, via a new procedure called RESQUE that uses quantile regression invariance to find sink nodes iteratively.

What is new is the identifiability theory for location-scale models and the RESQUE algorithm that combines residual building with composite quantile estimation. It performs well in the sense that the simulations reportedly beat baselines especially when variance encodes the directions, and they claim high-dimensional consistency.

The soft spots are around the invariance assumption for scales across quantiles, which might not hold generally and could be the main practical limitation. The theoretical part claims guarantees as p diverges with n, but without the details it's unclear how robust the rates are to the quantile estimation steps. The abstract is clear on the model, but real data applications would need to check if the heteroscedasticity is structured as assumed.

This paper is for people in causal inference and graphical models who want to incorporate variance information. It would be useful for those extending methods to more flexible error distributions.

I recommend sending it for peer review, as the contribution is specific enough to warrant checking the proofs and experiments in detail.

Referee Report

2 major / 1 minor

Summary. The paper studies causal discovery for DAGs under a structural equation model with additive heteroscedastic errors. It establishes new identifiability results showing that heteroscedasticity can be leveraged to recover causal directions via quantile-invariant conditional scale coefficients. It proposes the RESQUE iterative procedure (residual construction followed by composite quantile regression) for recursive sink-node identification, proves theoretical guarantees for topological order and graph recovery even when p diverges with n, and reports favorable simulation and benchmark performance relative to existing methods when variance components carry causal information.

Significance. If the identifiability and high-dimensional consistency results hold, the work provides a principled extension of causal discovery beyond mean-based modeling by exploiting structured variance signals. This is potentially valuable in domains where heteroscedasticity encodes directional information, and the allowance for diverging p broadens applicability.

major comments (2)

[Abstract] Abstract (identifiability paragraph): the central claim that 'heteroscedasticity can be leveraged to recover causal directions' rests on the location-scale model with quantile-invariant conditional scales; without the explicit theorem statement, assumptions, and proof, it is impossible to verify whether the invariance holds generically or only under additional restrictions that may not be stated.
[Abstract] Abstract (theoretical guarantees paragraph): the claim of 'theoretical guarantees for recovering topological order and graph structure, even when the number of variables diverges with the sample size' is load-bearing for the paper's contribution; the provided text gives no derivation, rate conditions, or proof sketch, leaving the soundness of the high-dimensional result unverified.

minor comments (1)

The abstract mentions simulation studies and benchmark applications but provides no details on data exclusion rules, simulation design, or performance metrics; these should be expanded for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their comments. We address the two major comments on the abstract below by directing to the corresponding formal results in the manuscript. The abstract is intended as a concise summary; the full statements, assumptions, and proofs appear in the body of the paper.

read point-by-point responses

Referee: [Abstract] Abstract (identifiability paragraph): the central claim that 'heteroscedasticity can be leveraged to recover causal directions' rests on the location-scale model with quantile-invariant conditional scales; without the explicit theorem statement, assumptions, and proof, it is impossible to verify whether the invariance holds generically or only under additional restrictions that may not be stated.

Authors: The identifiability result is stated precisely as Theorem 3.1 in Section 3. Under the location-scale structural equation model and assumptions (A1)–(A3), the theorem establishes that the conditional scale coefficients are invariant across quantiles if and only if the corresponding edge is absent. The proof in Appendix A shows that the invariance property holds under these model assumptions without further restrictions. The abstract condenses this result; the explicit statement, assumptions, and proof are provided in the main text. revision: no
Referee: [Abstract] Abstract (theoretical guarantees paragraph): the claim of 'theoretical guarantees for recovering topological order and graph structure, even when the number of variables diverges with the sample size' is load-bearing for the paper's contribution; the provided text gives no derivation, rate conditions, or proof sketch, leaving the soundness of the high-dimensional result unverified.

Authors: The high-dimensional consistency results appear as Theorem 4.2 and Corollary 4.3 in Section 4. These establish recovery of the topological order and graph structure when p diverges with n, subject to explicit rate conditions (p = o(n^{1/3}) under sub-Gaussian tails). A proof sketch is given in the main text of Section 4, with the complete derivation in Appendix B. The abstract summarizes these guarantees; the rate conditions and proofs are contained in the manuscript. revision: no

Circularity Check

0 steps flagged

No significant circularity

full rationale

The abstract and context describe identifiability results derived from the location-scale structural equation model assumptions (additive heteroscedastic errors with quantile-invariant conditional scales) and the RESQUE iterative procedure for sink-node identification. No equations, proofs, or self-citations are supplied that reduce any claimed prediction or first-principles result to fitted inputs by construction. The theoretical guarantees for topological order recovery (even with diverging p) are presented as following from the model properties rather than from renaming or self-referential fitting. This is the expected self-contained case.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Ledger constructed from abstract only; full paper details on parameters and assumptions unavailable.

axioms (1)

domain assumption Data generated from structural equation model with additive heteroscedastic errors
Explicitly stated as the model class for which new identifiability results are derived.

pith-pipeline@v0.9.1-grok · 5689 in / 1281 out tokens · 53954 ms · 2026-06-29T16:13:24.459922+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

72 extracted references · 1 canonical work pages · 1 internal anchor

[1]

Adamczak, A

R. Adamczak, A. E. Litvak, A. Pajor, and N. Tomczak-Jaegermann. Restricted isom- etry property of matrices with independent columns and neighborly polytopes by ran- dom sampling.Constructive Approximation, 34:61–88, 2011

2011
[2]

Bello, B

K. Bello, B. Aragam, and P. Ravikumar. DAGMA: Learning DAGs via M-matrices and a Log-Determinant Acyclicity Characterization. InAdvances in Neural Information Processing Systems, 2022

2022
[3]

Belloni and V

A. Belloni and V. Chernozhukov. l1-penalized quantile regression in high-dimensional sparse models.The Annals of Statistics, 39(1):82 – 130, 2011

2011
[4]

Inference for High-Dimensional Sparse Econometric Models

A. Belloni, V. Chernozhukov, and C. Hansen. Inference for high-dimensional sparse econometric models.arXiv preprint arXiv:1201.0220, 2011

work page internal anchor Pith review Pith/arXiv arXiv 2011
[5]

Belloni, V

A. Belloni, V. Chernozhukov, and K. Kato. Valid post-selection inference in high- dimensional approximately sparse quantile regression models.Journal of the American Statistical Association, 114(526):749–758, 2019

2019
[6]

P. M. Bentler. Causal modeling via structural equation systems. InHandbook of multivariate experimental psychology, pages 317–335. Springer, 1988

1988
[7]

Berrevoets, J

J. Berrevoets, J. Raymaekers, M. Van der Schaar, T. Verdonck, and R. Yao. Dif- ferentiable causal structure learning with identifiability by notime. InProceedings of machine learning research, volume 258, pages 3115–3123. PMLR, 2025. 28

2025
[8]

P. J. Bickel, Y. Ritov, and A. B. Tsybakov. Simultaneous analysis of lasso and dantzig selector.The Annals of Statistics, 37(4):1705–1732, 2009

2009
[9]

Bl¨ obaum, D

P. Bl¨ obaum, D. Janzing, T. Washio, S. Shimizu, and B. Sch¨ olkopf. Cause-effect infer- ence by comparing regression errors. InInternational Conference on Artificial Intelli- gence and Statistics, pages 900–909. PMLR, 2018

2018
[10]

B¨ uhlmann, J

P. B¨ uhlmann, J. Peters, and J. Ernest. CAM: Causal additive models, high- dimensional order search and penalized regression.The Annals of Statistics, 42(6):2526 – 2556, 2014

2014
[11]

Chang, Z

T.-H. Chang, Z. Guo, and D. Malinsky. Post-selection inference for causal effects after causal discovery.Biometrika, 113(1):asaf073, 2026

2026
[12]

D. M. Chickering. Optimal structure identification with greedy search.Journal of machine learning research, 3(Nov):507–554, 2002

2002
[13]

Fan and R

J. Fan and R. Li. Variable selection via nonconcave penalized likelihood and its oracle properties.Journal of the American statistical Association, 96(456):1348–1360, 2001

2001
[14]

J. Fan, L. Xue, and H. Zou. Strong oracle optimality of folded concave penalized estimation.Annals of statistics, 42(3):819, 2014

2014
[15]

Friedman, T

J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the graphical lasso.Biostatistics, 9(3):432–441, 2008

2008
[16]

M. Gao, Y. Ding, and B. Aragam. A polynomial-time algorithm for learning nonpara- metric causal graphs.Advances in Neural Information Processing Systems, 33:11599– 11611, 2020

2020
[17]

Glymour, K

C. Glymour, K. Zhang, and P. Spirtes. Review of causal discovery methods based on graphical models.Frontiers in genetics, 10:524, 2019

2019
[18]

Gradu, T

P. Gradu, T. Zrnic, Y. Wang, and M. I. Jordan. Valid inference after causal discovery. Journal of the American Statistical Association, 120(550):1127–1138, 2025

2025
[19]

Harris and M

N. Harris and M. Drton. PC algorithm for nonparanormal graphical models.Journal of Machine Learning Research, 14(11), 2013

2013
[20]

X. He, X. Pan, K. M. Tan, and W.-X. Zhou. Smoothed quantile regression with large-scale inference.Journal of Econometrics, 2021

2021
[21]

Heinze-Deml, M

C. Heinze-Deml, M. H. Maathuis, and N. Meinshausen. Causal structure learning. Annual Review of Statistics and Its Application, 5:371–391, 2018

2018
[22]

Hoyer, D

P. Hoyer, D. Janzing, J. Mooij, J. Peters, and B. Sch¨ olkopf. Nonlinear causal dis- covery with additive noise models. InTwenty-Second Annual Conference on Neural Information Processing Systems (NIPS 2008), pages 689–696. Curran, 2009

2008
[23]

Immer, C

A. Immer, C. Schultheiss, J. E. Vogt, B. Sch¨ olkopf, P. B¨ uhlmann, and A. Marx. On the identifiability and estimation of causal location-scale noise models. InInternational Conference on Machine Learning, pages 14316–14332. PMLR, 2023. 29

2023
[24]

Kalisch and P

M. Kalisch and P. B¨ uhlman. Estimating high-dimensional directed acyclic graphs with the pc-algorithm.Journal of Machine Learning Research, 8(3), 2007

2007
[25]

Koenker and G

R. Koenker and G. Bassett, Jr. Regression quantiles.Econometrica: journal of the Econometric Society, pages 33–50, 1978

1978
[26]

C. Li, X. Shen, and W. Pan. Likelihood ratio tests for a large directed acyclic graph. Journal of the American Statistical Association, 2020

2020
[27]

C. Li, X. Shen, and W. Pan. Nonlinear causal discovery with confounders.Journal of the American Statistical Association, pages 1–10, 2023

2023
[28]

Y. Li, A. Torralba, A. Anandkumar, D. Fox, and A. Garg. Causal discovery in physical systems from videos.Advances in Neural Information Processing Systems, 33:9180– 9192, 2020

2020
[29]

Li and J

Y. Li and J. Zhu. L 1-norm quantile regression.Journal of Computational and Graph- ical Statistics, 17(1):163–185, 2008

2008
[30]

Y. Lin, Y. Huang, W. Liu, H. Deng, I. Ng, K. Zhang, M. Gong, Y. Ma, and B. Huang. A skewness-based criterion for addressing heteroscedastic noise in causal discovery. InInternational Conference on Learning Representations, volume 2025, pages 89283– 89310, 2025

2025
[31]

M. H. Maathuis, D. Colombo, M. Kalisch, and P. B¨ uhlmann. Predicting causal effects in large-scale systems from observational data.Nature methods, 7(4):247–248, 2010

2010
[32]

Meinshausen and B

N. Meinshausen and B. Yu. Lasso-type recovery of sparse representations for high- dimensional data.The Annals of Statistics, 37(1):246–270, 2009

2009
[33]

Mendelson, A

S. Mendelson, A. Pajor, and N. Tomczak-Jaegermann. Uniform uncertainty principle for bernoulli and subgaussian ensembles.Constructive Approximation, 28:277–289, 2008

2008
[34]

J. M. Mooij and T. Heskes. Cyclic causal discovery from continuous equilibrium data. InProceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, pages 431–439, 2013

2013
[35]

J. M. Mooij, J. Peters, D. Janzing, J. Zscheischler, and B. Sch¨ olkopf. Distinguishing cause from effect using observational data: methods and benchmarks.Journal of Machine Learning Research, 17(32):1–102, 2016

2016
[36]

G. Park. Identifiability of additive noise models using conditional variances.Journal of Machine Learning Research, 21(75):1–34, 2020

2020
[37]

Pearl.Causality

J. Pearl.Causality. Cambridge university press, 2009

2009
[38]

Peters and P

J. Peters and P. B¨ uhlmann. Identifiability of gaussian structural equation models with equal error variances.Biometrika, 101(1):219–228, 2014. 30

2014
[39]

Peters, D

J. Peters, D. Janzing, and B. Sch¨ olkopf. Identifying cause and effect on discrete data using additive noise models. InProceedings of the thirteenth international conference on artificial intelligence and statistics, pages 597–604. JMLR Workshop and Confer- ence Proceedings, 2010

2010
[40]

Peters, J

J. Peters, J. M. Mooij, D. Janzing, and B. Sch¨ olkopf. Causal discovery with continuous additive noise models.Journal of Machine Learning Research, 15:2009–2053, 2014

2009
[41]

Y. Qiu, J. Tao, and X.-H. Zhou. Inference of heterogeneous treatment effects using observational data with high-dimensional covariates.Journal of the Royal Statistical Society Series B: Statistical Methodology, 83(5):1016–1043, 2021

2021
[42]

Raskutti, M

G. Raskutti, M. J. Wainwright, and B. Yu. Restricted eigenvalue properties for cor- related gaussian designs.The Journal of Machine Learning Research, 11:2241–2259, 2010

2010
[43]

Sachs, O

K. Sachs, O. Perez, D. Pe’er, D. A. Lauffenburger, and G. P. Nolan. Causal protein-signaling networks derived from multiparameter single-cell data.Science, 308(5721):523–529, 2005

2005
[44]

Sch¨ olkopf, F

B. Sch¨ olkopf, F. Locatello, S. Bauer, N. R. Ke, N. Kalchbrenner, A. Goyal, and Y. Ben- gio. Toward causal representation learning.Proceedings of the IEEE, 109(5):612–634, 2021

2021
[45]

Schultheiss and P

C. Schultheiss and P. B¨ uhlmann. Ancestor regression in linear structural equation models.Biometrika, 110(4):1117–1124, 2023

2023
[46]

Schultheiss and P

C. Schultheiss and P. B¨ uhlmann. On the pitfalls of gaussian likelihood scoring for causal discovery.Journal of Causal Inference, 11(1):20220068, 2023

2023
[47]

Shimizu, P

S. Shimizu, P. O. Hoyer, A. Hyv¨ arinen, A. Kerminen, and M. Jordan. A linear non- Gaussian acyclic model for causal discovery.Journal of Machine Learning Research, 7(10), 2006

2006
[48]

P. Spirtes. An anytime algorithm for causal inference. InInternational Workshop on Artificial Intelligence and Statistics, pages 278–285. PMLR, 2001

2001
[49]

Spirtes and C

P. Spirtes and C. Glymour. An algorithm for fast recovery of sparse causal graphs. Social science computer review, 9(1):62–72, 1991

1991
[50]

Spirtes, C

P. Spirtes, C. N. Glymour, and R. Scheines.Causation, prediction, and search. MIT press, 2000

2000
[51]

E. V. Strobl and T. A. Lasko. Identifying patient-specific root causes with the het- eroscedastic noise model.Journal of Computational Science, 72:102099, 2023

2023
[52]

Sun and O

X. Sun and O. Schulte. Cause-effect inference in location-scale noise models: Maxi- mum likelihood vs. independence testing.Advances in Neural Information Processing Systems, 36:5447–5483, 2023. 31

2023
[53]

K. M. Tan, L. Wang, and W.-X. Zhou. High-dimensional quantile regression: Convo- lution smoothing and concave regularization.Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(1):205–233, 2022

2022
[54]

Q.-D. Tran, B. Duong, P. Nguyen, and T. Nguyen. Robust estimation of causal het- eroscedastic noise models. InProceedings of the 2024 SIAM International Conference on Data Mining (SDM), pages 788–796. SIAM, 2024

2024
[55]

Tsamardinos, L

I. Tsamardinos, L. E. Brown, and C. F. Aliferis. The max-min hill-climbing bayesian network structure learning algorithm.Machine learning, 65(1):31–78, 2006

2006
[56]

M. J. Vowels, N. C. Camgoz, and R. Bowden. D’ya like DAGs? a survey on structure learning and causal discovery.ACM Computing Surveys, 55(4):1–36, 2022

2022
[57]

Y. S. Wang, M. Kolar, and M. Drton. Confidence sets for causal orderings.Journal of the American Statistical Association, pages 1–14, 2025

2025
[58]

H. Wold. Causality and econometrics.Econometrica: Journal of the Econometric Society, pages 162–177, 1954

1954
[59]

S. Xu, O. A. Mian, A. Marx, and J. Vreeken. Inferring cause and effect in the presence of heteroscedastic noise. InInternational Conference on Machine Learning, pages 24615–24630. PMLR, 2022

2022
[60]

Y. Yang, S. Bom, and X. Shen. A hierarchical ensemble causal structure learning approach for wafer manufacturing.Journal of Intelligent Manufacturing, 35(6):2961– 2978, 2024

2024
[61]

Ye and C.-H

F. Ye and C.-H. Zhang. Rate minimaxity of the lasso and dantzig selector for the lq loss in lr balls.The Journal of Machine Learning Research, 11:3519–3540, 2010

2010
[62]

N. Yin, T. Gao, Y. Yu, and Q. Ji. Effective causal discovery under identifiable het- eroscedastic noise model. InProceedings of the AAAI Conference on Artificial Intel- ligence, volume 38, pages 16486–16494, 2024

2024
[63]

Y. Yuan, X. Shen, W. Pan, and Z. Wang. Constrained likelihood for reconstructing a directed acyclic gaussian graph.Biometrika, 106(1):109–125, 2019

2019
[64]

C.-H. Zhang. Nearly unbiased variable selection under minimax concave penalty.An- nals of statistics, 38(2):894–942, 2010

2010
[65]

Zhang and A

K. Zhang and A. Hyv¨ arinen. On the identifiability of the post-nonlinear causal model. InProceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pages 647–655, 2009

2009
[66]

Zhang and A

K. Zhang and A. Hyv¨ arinen. Distinguishing causes from effects using nonlinear acyclic causal models. InCausality: Objectives and Assessment, pages 157–164. PMLR, 2010

2010
[67]

Zhang, Y

T. Zhang, Y. Zhang, and T. Zhou. Statistical insights into HSIC in high dimensions. Advances in Neural Information Processing Systems, 36:19145–19156, 2023. 32

2023
[68]

Zhao and B

P. Zhao and B. Yu. On model selection consistency of lasso.The Journal of Machine Learning Research, 7:2541–2563, 2006

2006
[69]

R. Zhao, X. He, and J. Wang. Learning linear non-gaussian directed acyclic graph with diverging number of nodes.The Journal of Machine Learning Research, 23(1):12314– 12347, 2022

2022
[70]

Zheng, B

X. Zheng, B. Aragam, P. K. Ravikumar, and E. P. Xing. Dags with no tears: Contin- uous optimization for structure learning.Advances in neural information processing systems, 31, 2018

2018
[71]

Zhou and H

L. Zhou and H. Zou. Cross-fitted residual regression for high-dimensional heteroscedas- ticity pursuit.Journal of the American Statistical Association, 118(542):1056–1065, 2023

2023
[72]

Zou and M

H. Zou and M. Yuan. Composite quantile regression and the oracle model selection theory.Annals of Statistics, 36(3):1108–1126, 2008. 33

2008

[1] [1]

Adamczak, A

R. Adamczak, A. E. Litvak, A. Pajor, and N. Tomczak-Jaegermann. Restricted isom- etry property of matrices with independent columns and neighborly polytopes by ran- dom sampling.Constructive Approximation, 34:61–88, 2011

2011

[2] [2]

Bello, B

K. Bello, B. Aragam, and P. Ravikumar. DAGMA: Learning DAGs via M-matrices and a Log-Determinant Acyclicity Characterization. InAdvances in Neural Information Processing Systems, 2022

2022

[3] [3]

Belloni and V

A. Belloni and V. Chernozhukov. l1-penalized quantile regression in high-dimensional sparse models.The Annals of Statistics, 39(1):82 – 130, 2011

2011

[4] [4]

Inference for High-Dimensional Sparse Econometric Models

A. Belloni, V. Chernozhukov, and C. Hansen. Inference for high-dimensional sparse econometric models.arXiv preprint arXiv:1201.0220, 2011

work page internal anchor Pith review Pith/arXiv arXiv 2011

[5] [5]

Belloni, V

A. Belloni, V. Chernozhukov, and K. Kato. Valid post-selection inference in high- dimensional approximately sparse quantile regression models.Journal of the American Statistical Association, 114(526):749–758, 2019

2019

[6] [6]

P. M. Bentler. Causal modeling via structural equation systems. InHandbook of multivariate experimental psychology, pages 317–335. Springer, 1988

1988

[7] [7]

Berrevoets, J

J. Berrevoets, J. Raymaekers, M. Van der Schaar, T. Verdonck, and R. Yao. Dif- ferentiable causal structure learning with identifiability by notime. InProceedings of machine learning research, volume 258, pages 3115–3123. PMLR, 2025. 28

2025

[8] [8]

P. J. Bickel, Y. Ritov, and A. B. Tsybakov. Simultaneous analysis of lasso and dantzig selector.The Annals of Statistics, 37(4):1705–1732, 2009

2009

[9] [9]

Bl¨ obaum, D

P. Bl¨ obaum, D. Janzing, T. Washio, S. Shimizu, and B. Sch¨ olkopf. Cause-effect infer- ence by comparing regression errors. InInternational Conference on Artificial Intelli- gence and Statistics, pages 900–909. PMLR, 2018

2018

[10] [10]

B¨ uhlmann, J

P. B¨ uhlmann, J. Peters, and J. Ernest. CAM: Causal additive models, high- dimensional order search and penalized regression.The Annals of Statistics, 42(6):2526 – 2556, 2014

2014

[11] [11]

Chang, Z

T.-H. Chang, Z. Guo, and D. Malinsky. Post-selection inference for causal effects after causal discovery.Biometrika, 113(1):asaf073, 2026

2026

[12] [12]

D. M. Chickering. Optimal structure identification with greedy search.Journal of machine learning research, 3(Nov):507–554, 2002

2002

[13] [13]

Fan and R

J. Fan and R. Li. Variable selection via nonconcave penalized likelihood and its oracle properties.Journal of the American statistical Association, 96(456):1348–1360, 2001

2001

[14] [14]

J. Fan, L. Xue, and H. Zou. Strong oracle optimality of folded concave penalized estimation.Annals of statistics, 42(3):819, 2014

2014

[15] [15]

Friedman, T

J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the graphical lasso.Biostatistics, 9(3):432–441, 2008

2008

[16] [16]

M. Gao, Y. Ding, and B. Aragam. A polynomial-time algorithm for learning nonpara- metric causal graphs.Advances in Neural Information Processing Systems, 33:11599– 11611, 2020

2020

[17] [17]

Glymour, K

C. Glymour, K. Zhang, and P. Spirtes. Review of causal discovery methods based on graphical models.Frontiers in genetics, 10:524, 2019

2019

[18] [18]

Gradu, T

P. Gradu, T. Zrnic, Y. Wang, and M. I. Jordan. Valid inference after causal discovery. Journal of the American Statistical Association, 120(550):1127–1138, 2025

2025

[19] [19]

Harris and M

N. Harris and M. Drton. PC algorithm for nonparanormal graphical models.Journal of Machine Learning Research, 14(11), 2013

2013

[20] [20]

X. He, X. Pan, K. M. Tan, and W.-X. Zhou. Smoothed quantile regression with large-scale inference.Journal of Econometrics, 2021

2021

[21] [21]

Heinze-Deml, M

C. Heinze-Deml, M. H. Maathuis, and N. Meinshausen. Causal structure learning. Annual Review of Statistics and Its Application, 5:371–391, 2018

2018

[22] [22]

Hoyer, D

P. Hoyer, D. Janzing, J. Mooij, J. Peters, and B. Sch¨ olkopf. Nonlinear causal dis- covery with additive noise models. InTwenty-Second Annual Conference on Neural Information Processing Systems (NIPS 2008), pages 689–696. Curran, 2009

2008

[23] [23]

Immer, C

A. Immer, C. Schultheiss, J. E. Vogt, B. Sch¨ olkopf, P. B¨ uhlmann, and A. Marx. On the identifiability and estimation of causal location-scale noise models. InInternational Conference on Machine Learning, pages 14316–14332. PMLR, 2023. 29

2023

[24] [24]

Kalisch and P

M. Kalisch and P. B¨ uhlman. Estimating high-dimensional directed acyclic graphs with the pc-algorithm.Journal of Machine Learning Research, 8(3), 2007

2007

[25] [25]

Koenker and G

R. Koenker and G. Bassett, Jr. Regression quantiles.Econometrica: journal of the Econometric Society, pages 33–50, 1978

1978

[26] [26]

C. Li, X. Shen, and W. Pan. Likelihood ratio tests for a large directed acyclic graph. Journal of the American Statistical Association, 2020

2020

[27] [27]

C. Li, X. Shen, and W. Pan. Nonlinear causal discovery with confounders.Journal of the American Statistical Association, pages 1–10, 2023

2023

[28] [28]

Y. Li, A. Torralba, A. Anandkumar, D. Fox, and A. Garg. Causal discovery in physical systems from videos.Advances in Neural Information Processing Systems, 33:9180– 9192, 2020

2020

[29] [29]

Li and J

Y. Li and J. Zhu. L 1-norm quantile regression.Journal of Computational and Graph- ical Statistics, 17(1):163–185, 2008

2008

[30] [30]

Y. Lin, Y. Huang, W. Liu, H. Deng, I. Ng, K. Zhang, M. Gong, Y. Ma, and B. Huang. A skewness-based criterion for addressing heteroscedastic noise in causal discovery. InInternational Conference on Learning Representations, volume 2025, pages 89283– 89310, 2025

2025

[31] [31]

M. H. Maathuis, D. Colombo, M. Kalisch, and P. B¨ uhlmann. Predicting causal effects in large-scale systems from observational data.Nature methods, 7(4):247–248, 2010

2010

[32] [32]

Meinshausen and B

N. Meinshausen and B. Yu. Lasso-type recovery of sparse representations for high- dimensional data.The Annals of Statistics, 37(1):246–270, 2009

2009

[33] [33]

Mendelson, A

S. Mendelson, A. Pajor, and N. Tomczak-Jaegermann. Uniform uncertainty principle for bernoulli and subgaussian ensembles.Constructive Approximation, 28:277–289, 2008

2008

[34] [34]

J. M. Mooij and T. Heskes. Cyclic causal discovery from continuous equilibrium data. InProceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, pages 431–439, 2013

2013

[35] [35]

J. M. Mooij, J. Peters, D. Janzing, J. Zscheischler, and B. Sch¨ olkopf. Distinguishing cause from effect using observational data: methods and benchmarks.Journal of Machine Learning Research, 17(32):1–102, 2016

2016

[36] [36]

G. Park. Identifiability of additive noise models using conditional variances.Journal of Machine Learning Research, 21(75):1–34, 2020

2020

[37] [37]

Pearl.Causality

J. Pearl.Causality. Cambridge university press, 2009

2009

[38] [38]

Peters and P

J. Peters and P. B¨ uhlmann. Identifiability of gaussian structural equation models with equal error variances.Biometrika, 101(1):219–228, 2014. 30

2014

[39] [39]

Peters, D

J. Peters, D. Janzing, and B. Sch¨ olkopf. Identifying cause and effect on discrete data using additive noise models. InProceedings of the thirteenth international conference on artificial intelligence and statistics, pages 597–604. JMLR Workshop and Confer- ence Proceedings, 2010

2010

[40] [40]

Peters, J

J. Peters, J. M. Mooij, D. Janzing, and B. Sch¨ olkopf. Causal discovery with continuous additive noise models.Journal of Machine Learning Research, 15:2009–2053, 2014

2009

[41] [41]

Y. Qiu, J. Tao, and X.-H. Zhou. Inference of heterogeneous treatment effects using observational data with high-dimensional covariates.Journal of the Royal Statistical Society Series B: Statistical Methodology, 83(5):1016–1043, 2021

2021

[42] [42]

Raskutti, M

G. Raskutti, M. J. Wainwright, and B. Yu. Restricted eigenvalue properties for cor- related gaussian designs.The Journal of Machine Learning Research, 11:2241–2259, 2010

2010

[43] [43]

Sachs, O

K. Sachs, O. Perez, D. Pe’er, D. A. Lauffenburger, and G. P. Nolan. Causal protein-signaling networks derived from multiparameter single-cell data.Science, 308(5721):523–529, 2005

2005

[44] [44]

Sch¨ olkopf, F

B. Sch¨ olkopf, F. Locatello, S. Bauer, N. R. Ke, N. Kalchbrenner, A. Goyal, and Y. Ben- gio. Toward causal representation learning.Proceedings of the IEEE, 109(5):612–634, 2021

2021

[45] [45]

Schultheiss and P

C. Schultheiss and P. B¨ uhlmann. Ancestor regression in linear structural equation models.Biometrika, 110(4):1117–1124, 2023

2023

[46] [46]

Schultheiss and P

C. Schultheiss and P. B¨ uhlmann. On the pitfalls of gaussian likelihood scoring for causal discovery.Journal of Causal Inference, 11(1):20220068, 2023

2023

[47] [47]

Shimizu, P

S. Shimizu, P. O. Hoyer, A. Hyv¨ arinen, A. Kerminen, and M. Jordan. A linear non- Gaussian acyclic model for causal discovery.Journal of Machine Learning Research, 7(10), 2006

2006

[48] [48]

P. Spirtes. An anytime algorithm for causal inference. InInternational Workshop on Artificial Intelligence and Statistics, pages 278–285. PMLR, 2001

2001

[49] [49]

Spirtes and C

P. Spirtes and C. Glymour. An algorithm for fast recovery of sparse causal graphs. Social science computer review, 9(1):62–72, 1991

1991

[50] [50]

Spirtes, C

P. Spirtes, C. N. Glymour, and R. Scheines.Causation, prediction, and search. MIT press, 2000

2000

[51] [51]

E. V. Strobl and T. A. Lasko. Identifying patient-specific root causes with the het- eroscedastic noise model.Journal of Computational Science, 72:102099, 2023

2023

[52] [52]

Sun and O

X. Sun and O. Schulte. Cause-effect inference in location-scale noise models: Maxi- mum likelihood vs. independence testing.Advances in Neural Information Processing Systems, 36:5447–5483, 2023. 31

2023

[53] [53]

K. M. Tan, L. Wang, and W.-X. Zhou. High-dimensional quantile regression: Convo- lution smoothing and concave regularization.Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(1):205–233, 2022

2022

[54] [54]

Q.-D. Tran, B. Duong, P. Nguyen, and T. Nguyen. Robust estimation of causal het- eroscedastic noise models. InProceedings of the 2024 SIAM International Conference on Data Mining (SDM), pages 788–796. SIAM, 2024

2024

[55] [55]

Tsamardinos, L

I. Tsamardinos, L. E. Brown, and C. F. Aliferis. The max-min hill-climbing bayesian network structure learning algorithm.Machine learning, 65(1):31–78, 2006

2006

[56] [56]

M. J. Vowels, N. C. Camgoz, and R. Bowden. D’ya like DAGs? a survey on structure learning and causal discovery.ACM Computing Surveys, 55(4):1–36, 2022

2022

[57] [57]

Y. S. Wang, M. Kolar, and M. Drton. Confidence sets for causal orderings.Journal of the American Statistical Association, pages 1–14, 2025

2025

[58] [58]

H. Wold. Causality and econometrics.Econometrica: Journal of the Econometric Society, pages 162–177, 1954

1954

[59] [59]

S. Xu, O. A. Mian, A. Marx, and J. Vreeken. Inferring cause and effect in the presence of heteroscedastic noise. InInternational Conference on Machine Learning, pages 24615–24630. PMLR, 2022

2022

[60] [60]

Y. Yang, S. Bom, and X. Shen. A hierarchical ensemble causal structure learning approach for wafer manufacturing.Journal of Intelligent Manufacturing, 35(6):2961– 2978, 2024

2024

[61] [61]

Ye and C.-H

F. Ye and C.-H. Zhang. Rate minimaxity of the lasso and dantzig selector for the lq loss in lr balls.The Journal of Machine Learning Research, 11:3519–3540, 2010

2010

[62] [62]

N. Yin, T. Gao, Y. Yu, and Q. Ji. Effective causal discovery under identifiable het- eroscedastic noise model. InProceedings of the AAAI Conference on Artificial Intel- ligence, volume 38, pages 16486–16494, 2024

2024

[63] [63]

Y. Yuan, X. Shen, W. Pan, and Z. Wang. Constrained likelihood for reconstructing a directed acyclic gaussian graph.Biometrika, 106(1):109–125, 2019

2019

[64] [64]

C.-H. Zhang. Nearly unbiased variable selection under minimax concave penalty.An- nals of statistics, 38(2):894–942, 2010

2010

[65] [65]

Zhang and A

K. Zhang and A. Hyv¨ arinen. On the identifiability of the post-nonlinear causal model. InProceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pages 647–655, 2009

2009

[66] [66]

Zhang and A

K. Zhang and A. Hyv¨ arinen. Distinguishing causes from effects using nonlinear acyclic causal models. InCausality: Objectives and Assessment, pages 157–164. PMLR, 2010

2010

[67] [67]

Zhang, Y

T. Zhang, Y. Zhang, and T. Zhou. Statistical insights into HSIC in high dimensions. Advances in Neural Information Processing Systems, 36:19145–19156, 2023. 32

2023

[68] [68]

Zhao and B

P. Zhao and B. Yu. On model selection consistency of lasso.The Journal of Machine Learning Research, 7:2541–2563, 2006

2006

[69] [69]

R. Zhao, X. He, and J. Wang. Learning linear non-gaussian directed acyclic graph with diverging number of nodes.The Journal of Machine Learning Research, 23(1):12314– 12347, 2022

2022

[70] [70]

Zheng, B

X. Zheng, B. Aragam, P. K. Ravikumar, and E. P. Xing. Dags with no tears: Contin- uous optimization for structure learning.Advances in neural information processing systems, 31, 2018

2018

[71] [71]

Zhou and H

L. Zhou and H. Zou. Cross-fitted residual regression for high-dimensional heteroscedas- ticity pursuit.Journal of the American Statistical Association, 118(542):1056–1065, 2023

2023

[72] [72]

Zou and M

H. Zou and M. Yuan. Composite quantile regression and the oracle model selection theory.Annals of Statistics, 36(3):1108–1126, 2008. 33

2008