pith. machine review for the scientific record. sign in

arxiv: 2602.07915 · v2 · submitted 2026-02-08 · 💻 cs.LG · cs.AI· stat.ME· stat.ML

Recognition: no theorem link

CausalCompass: Evaluating the Robustness of Time-Series Causal Discovery in Misspecified Scenarios

Authors on Pith no claims yet

Pith reviewed 2026-05-16 06:25 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.MEstat.ML
keywords causal discoverytime seriesrobustnessbenchmarkassumption violationdeep learningmisspecificationcausal inference
0
0 comments X

The pith

CausalCompass benchmark shows deep learning methods outperform others in time-series causal discovery when assumptions are violated.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper creates CausalCompass to test time-series causal discovery algorithms in settings where standard modeling assumptions fail. It runs representative methods through eight distinct violation scenarios and finds that performance varies widely with no single winner in every case. Deep learning approaches achieve the strongest results across the full set of tests. The work adds hyperparameter checks and ablation studies to explain why those methods hold up better. This setup gives practitioners a clearer way to pick algorithms for real data where assumptions rarely hold perfectly.

Core claim

CausalCompass is a flexible benchmark framework for assessing the robustness of time-series causal discovery methods under violations of modeling assumptions. Experiments across eight scenarios show that no method attains optimal performance in all settings, yet deep learning-based approaches exhibit superior overall performance. The framework also reveals that NTS-NOTEARS depends heavily on standardized preprocessing and that ablation studies clarify the sources of deep learning strength under misspecification.

What carries the argument

CausalCompass, a benchmark that applies eight specific assumption-violation scenarios to representative time-series causal discovery algorithms and measures their performance.

If this is right

  • Practitioners should favor deep learning methods when applying time-series causal discovery to data likely to violate standard assumptions.
  • Hyperparameter sensitivity must be checked because performance rankings shift with different settings.
  • Standardization preprocessing should be applied by default for methods like NTS-NOTEARS to avoid poor results in the unprocessed case.
  • Future algorithm design should prioritize robustness mechanisms that deep learning appears to exploit under misspecification.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Combined violations occurring together in one dataset would be a natural next test to see whether the deep learning advantage persists.
  • Adding real-world datasets with unknown violation patterns could validate whether the simulated scenarios predict actual performance.
  • The flexibility of deep learning in capturing nonlinear dependencies may explain its edge and suggest targeted improvements for classical methods.

Load-bearing premise

The eight chosen assumption-violation scenarios adequately represent the range and severity of misspecifications that occur in real-world time-series data.

What would settle it

Observing that a non-deep-learning method achieves the highest average score across the eight scenarios on new or real-world datasets would challenge the claim of deep learning superiority.

Figures

Figures reproduced from arXiv: 2602.07915 by Duxin Chen, He Wang, Huiyang Yi, Wenwu Yu, Xiaojian Shen, Yonggang Wu.

Figure 1
Figure 1. Figure 1: Experimental results under the linear and nonlinear settings across the vanilla scenario and [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Experimental results under the linear and nonlinear settings across the vanilla scenario and [PITH_FULL_IMAGE:figures/full_fig_p049_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Experimental results under the nonlinear settings across the vanilla scenario and eight [PITH_FULL_IMAGE:figures/full_fig_p050_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Experimental results under the linear and nonlinear settings across the vanilla scenario and [PITH_FULL_IMAGE:figures/full_fig_p051_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Experimental results under the linear and nonlinear settings across the vanilla scenario and [PITH_FULL_IMAGE:figures/full_fig_p052_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Experimental results under the nonlinear settings across the vanilla scenario and eight [PITH_FULL_IMAGE:figures/full_fig_p053_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Experimental results under the linear and nonlinear settings across the vanilla scenario and [PITH_FULL_IMAGE:figures/full_fig_p054_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Experimental results under the linear and nonlinear settings across the vanilla scenario [PITH_FULL_IMAGE:figures/full_fig_p055_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Experimental results under the nonlinear settings across the vanilla scenario and eight [PITH_FULL_IMAGE:figures/full_fig_p056_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Experimental results under the linear and nonlinear settings across the vanilla scenario and [PITH_FULL_IMAGE:figures/full_fig_p057_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Experimental results under the linear and nonlinear settings across the vanilla scenario [PITH_FULL_IMAGE:figures/full_fig_p058_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Experimental results under the nonlinear settings across the vanilla scenario and eight [PITH_FULL_IMAGE:figures/full_fig_p059_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Experimental results under the linear and nonlinear settings across the vanilla scenario [PITH_FULL_IMAGE:figures/full_fig_p060_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Experimental results under the linear and nonlinear settings across the vanilla scenario [PITH_FULL_IMAGE:figures/full_fig_p061_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Experimental results under the nonlinear settings across the vanilla scenario and eight [PITH_FULL_IMAGE:figures/full_fig_p062_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Experimental results under the linear and nonlinear settings across the vanilla scenario [PITH_FULL_IMAGE:figures/full_fig_p063_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Experimental results under the linear and nonlinear settings across the vanilla scenario [PITH_FULL_IMAGE:figures/full_fig_p064_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Experimental results under the nonlinear settings across the vanilla scenario and eight [PITH_FULL_IMAGE:figures/full_fig_p065_18.png] view at source ↗
read the original abstract

Causal discovery from time series is a fundamental task in machine learning. However, its widespread adoption is hindered by a reliance on untestable causal assumptions and by the lack of robustness-oriented evaluation in existing benchmarks. To address these challenges, we propose CausalCompass, a flexible and extensible benchmark framework designed to assess the robustness of time-series causal discovery (TSCD) methods under violations of modeling assumptions. To demonstrate the practical utility of CausalCompass, we conduct extensive benchmarking of representative TSCD algorithms across eight assumption-violation scenarios. Our experimental results indicate that no single method consistently attains optimal performance across all settings. Nevertheless, the methods exhibiting superior overall performance across diverse scenarios are almost invariably deep learning-based approaches. We further provide hyperparameter sensitivity analyses to deepen the understanding of these findings. We additionally conduct ablation experiments to explain the strong performance of deep learning-based methods under assumption violations. We also find, somewhat surprisingly, that NTS-NOTEARS relies heavily on standardized preprocessing in practice, performing poorly in the vanilla setting but exhibiting strong performance after standardization. Finally, our work aims to provide a comprehensive and systematic evaluation of TSCD methods under assumption violations, thereby facilitating their broader adoption in real-world applications. The user-friendly implementation, documentation and datasets are available at https://anonymous.4open.science/r/CausalCompass-anonymous-5B4F/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces CausalCompass, a flexible benchmark framework for evaluating the robustness of time-series causal discovery (TSCD) methods under violations of standard modeling assumptions. Through benchmarking of representative algorithms across eight synthetic assumption-violation scenarios, the authors report that no single method is optimal in all cases, but deep learning-based approaches consistently show superior overall performance. The work includes hyperparameter sensitivity analyses, ablation studies to explain DL advantages, and a specific observation that NTS-NOTEARS performs poorly without standardization but strongly with it.

Significance. If the empirical rankings hold, CausalCompass provides a much-needed tool for systematic robustness assessment in TSCD, where untestable assumptions often limit practical adoption. The emphasis on open implementation, documentation, and datasets supports reproducibility and community use. The finding that DL methods are more resilient under misspecification could inform method selection in domains like finance and neuroscience, though this depends on the scenarios' fidelity to real data.

major comments (2)
  1. [Experimental Setup / Scenario Design] The headline claim that deep learning-based TSCD methods exhibit superior overall performance across diverse misspecified settings is load-bearing on the eight chosen violation scenarios being representative. The manuscript provides no quantitative validation (e.g., matching of statistical signatures such as nonlinearity strength or latent confounding levels) that these synthetic regimes align with misspecifications observed in real-world time-series from target domains.
  2. [Results and Discussion] The finding that NTS-NOTEARS relies heavily on standardized preprocessing (poor in vanilla setting, strong after standardization) is presented as surprising; this raises the question of whether preprocessing choices were uniformly applied across all methods, which could affect the fairness of the DL vs. non-DL ranking in § on results.
minor comments (1)
  1. [Abstract] The abstract states the repository link as anonymous; replace with a permanent DOI or GitHub link in the final version.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive feedback on our manuscript. We address each major comment below and outline the revisions we will make to strengthen the paper.

read point-by-point responses
  1. Referee: [Experimental Setup / Scenario Design] The headline claim that deep learning-based TSCD methods exhibit superior overall performance across diverse misspecified settings is load-bearing on the eight chosen violation scenarios being representative. The manuscript provides no quantitative validation (e.g., matching of statistical signatures such as nonlinearity strength or latent confounding levels) that these synthetic regimes align with misspecifications observed in real-world time-series from target domains.

    Authors: We thank the referee for this important observation. The eight scenarios were selected to represent common assumption violations from the TSCD literature (nonlinearity, latent confounding, non-stationarity, etc.) in a controlled manner. We did not perform quantitative statistical signature matching to specific real-world datasets, as the benchmark prioritizes synthetic control for isolating misspecification effects. In the revision we will expand the scenario design section with additional motivation and references to target domains (finance, neuroscience), explicitly acknowledge the synthetic nature of the regimes, and discuss limitations regarding real-world alignment. revision: partial

  2. Referee: [Results and Discussion] The finding that NTS-NOTEARS relies heavily on standardized preprocessing (poor in vanilla setting, strong after standardization) is presented as surprising; this raises the question of whether preprocessing choices were uniformly applied across all methods, which could affect the fairness of the DL vs. non-DL ranking in § on results.

    Authors: We confirm that preprocessing was applied uniformly across all methods. The vanilla setting uses raw data with no standardization, while the standardized setting applies z-score normalization consistently to every algorithm before model fitting. The NTS-NOTEARS result was obtained under these identical conditions. We will revise the results section to explicitly document the preprocessing pipeline, state that it was identical for all methods, and clarify the experimental settings to remove any ambiguity about fairness. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical benchmarking on synthetic data

full rationale

The paper contains no mathematical derivation chain, first-principles predictions, or fitted parameters that are later renamed as outputs. It defines eight synthetic assumption-violation scenarios, generates data from them, runs existing TSCD algorithms (including DL-based ones), and reports performance metrics. All results are direct experimental measurements on the generated data; no step reduces by construction to a self-defined quantity or to a self-citation whose content is unverified. The central claim (DL methods show superior aggregate performance) is therefore an empirical observation, not a tautology. The representativeness of the eight scenarios is a separate validity question outside the scope of circularity analysis.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work relies on standard domain assumptions from causal discovery literature (e.g., that controlled violations can be generated to test robustness) but introduces no new free parameters, axioms, or invented entities beyond the benchmark itself.

axioms (1)
  • domain assumption Standard causal assumptions such as no hidden confounders and correct model specification are frequently violated in practice.
    This premise motivates the need for the robustness benchmark.

pith-pipeline@v0.9.0 · 5565 in / 1134 out tokens · 32238 ms · 2026-05-16T06:25:21.437111+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

84 extracted references · 84 canonical work pages · 2 internal anchors

  1. [1]

    Regime identification for improving causal analysis in non-stationary timeseries.arXiv preprint arXiv:2405.02315, 2024

    Wasim Ahmad, Maha Shadaydeh, and Joachim Denzler. Regime identification for improving causal analysis in non-stationary timeseries.arXiv preprint arXiv:2405.02315, 2024

  2. [2]

    Temporal causal modeling with graphical granger methods

    Andrew Arnold, Yan Liu, and Naoki Abe. Temporal causal modeling with graphical granger methods. InProceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 66–75, 2007

  3. [3]

    Survey and evaluation of causal discovery methods for time series.Journal of Artificial Intelligence Research, 73:767–819, 2022

    Charles K Assaad, Emilie Devijver, and Eric Gaussier. Survey and evaluation of causal discovery methods for time series.Journal of Artificial Intelligence Research, 73:767–819, 2022

  4. [4]

    The use of the area under the roc curve in the evaluation of machine learning algorithms.Pattern recognition, 30(7):1145–1159, 1997

    Andrew P Bradley. The use of the area under the roc curve in the evaluation of machine learning algorithms.Pattern recognition, 30(7):1145–1159, 1997

  5. [5]

    Tangent space causal inference: Leveraging vector fields for causal discovery in dynamical systems.Advances in Neural Information Processing Systems, 37:120078–120102, 2024

    Kurt Butler, Daniel Waxman, and Petar Djuric. Tangent space causal inference: Leveraging vector fields for causal discovery in dynamical systems.Advances in Neural Information Processing Systems, 37:120078–120102, 2024

  6. [6]

    Triad constraints for learning causal structure of latent variables

    Ruichu Cai, Feng Xie, Clark Glymour, Zhifeng Hao, and Kun Zhang. Triad constraints for learning causal structure of latent variables. InAdvances in Neural Information Processing Systems, volume 32, 2019

  7. [7]

    Causal discoveries for high dimensional mixed data.Statistics in Medicine, 41(24):4924–4940, 2022

    Zhanrui Cai, Dong Xi, Xuan Zhu, and Runze Li. Causal discoveries for high dimensional mixed data.Statistics in Medicine, 41(24):4924–4940, 2022

  8. [8]

    Chapman and hall/CRC, 2019

    Chris Chatfield and Haipeng Xing.The analysis of time series: an introduction with R. Chapman and hall/CRC, 2019

  9. [9]

    Addressing information asymmetry: Deep temporal causal- ity discovery for mixed time series.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

    Jiawei Chen and Chunhui Zhao. Addressing information asymmetry: Deep temporal causal- ity discovery for mixed time series.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

  10. [10]

    Neural ordinary differential equations.Advances in neural information processing systems, 31, 2018

    Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations.Advances in neural information processing systems, 31, 2018

  11. [11]

    Cuts: Neural causal discovery from irregular time-series data.arXiv preprint arXiv:2302.07458, 2023

    Yuxiao Cheng, Runzhao Yang, Tingxiong Xiao, Zongren Li, Jinli Suo, Kunlun He, and Qionghai Dai. Cuts: Neural causal discovery from irregular time-series data.arXiv preprint arXiv:2302.07458, 2023

  12. [12]

    Cuts+: High-dimensional causal discovery from irregular time-series

    Yuxiao Cheng, Lianglong Li, Tingxiong Xiao, Zongren Li, Jinli Suo, Kunlun He, and Qionghai Dai. Cuts+: High-dimensional causal discovery from irregular time-series. InProceedings of the AAAI Conference on Artificial Intelligence, pages 11525–11533, 2024

  13. [13]

    Search for additive nonlinear time series causal models.Journal of Machine Learning Research, 9(5), 2008

    Tianjiao Chu, Clark Glymour, and Greg Ridgeway. Search for additive nonlinear time series causal models.Journal of Machine Learning Research, 9(5), 2008

  14. [14]

    A seasonal-trend decomposition procedure based on loess (with discussion).J

    STL Cleveland. A seasonal-trend decomposition procedure based on loess (with discussion).J. Off. Stat, 6(3), 1990

  15. [15]

    Copula pc algorithm for causal discovery from mixed data

    Ruifei Cui, Perry Groot, and Tom Heskes. Copula pc algorithm for causal discovery from mixed data. InJoint European conference on machine learning and knowledge discovery in databases, pages 377–392. Springer, 2016

  16. [16]

    Haoyue Dai, Peter Spirtes, and Kun Zhang. Independence testing-based approach to causal discovery under measurement error and linear non-Gaussian models.Advances in Neural Information Processing Systems, 35:27524–27536, 2022. 10

  17. [17]

    The relationship between precision-recall and roc curves

    Jesse Davis and Mark Goadrich. The relationship between precision-recall and roc curves. In Proceedings of the 23rd international conference on Machine learning, pages 233–240, 2006

  18. [18]

    On causal discovery from time series data using fci.Proba- bilistic graphical models, 16, 2010

    Doris Entner and Patrik O Hoyer. On causal discovery from time series data using fci.Proba- bilistic graphical models, 16, 2010

  19. [19]

    Timegraph: Synthetic benchmark datasets for robust time-series causal discovery

    Muhammad Hasan Ferdous, Emam Hossain, and Md Osman Gani. Timegraph: Synthetic benchmark datasets for robust time-series causal discovery. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, pages 5425–5435, 2025

  20. [20]

    Causal discovery of gene regulation with incomplete data.Journal of the Royal Statistical Society Series A: Statistics in Society, 183(4): 1747–1775, 2020

    Ronja Foraita, Juliane Friemel, Kathrin Günther, Thomas Behrens, Jörn Bullerdiek, Rolf Nimzyk, Wolfgang Ahrens, and Vanessa Didelez. Causal discovery of gene regulation with incomplete data.Journal of the Royal Statistical Society Series A: Statistics in Society, 183(4): 1747–1775, 2020

  21. [21]

    Causal discovery for non-stationary non-linear time series data using just-in-time modeling

    Daigo Fujiwara, Kazuki Koyama, Keisuke Kiritoshi, Tomomi Okawachi, Tomonori Izumitani, and Shohei Shimizu. Causal discovery for non-stationary non-linear time series data using just-in-time modeling. InConference on Causal Learning and Reasoning, pages 880–894. PMLR, 2023

  22. [22]

    MissDAG: Causal discovery in the presence of missing data with continuous additive noise models

    Erdun Gao, Ignavier Ng, Mingming Gong, Li Shen, Wei Huang, Tongliang Liu, Kun Zhang, and Howard Bondell. MissDAG: Causal discovery in the presence of missing data with continuous additive noise models. InAdvances in Neural Information Processing Systems, volume 35, pages 5024–5038, 2022

  23. [23]

    Meta-d2ag: Causal graph learning with interventional dynamic data

    Tian Gao, Songtao Lu, Junkyu Lee, Elliot Nelson, Debarun Bhattacharjya, Yue Yu, and Miao Liu. Meta-d2ag: Causal graph learning with interventional dynamic data. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  24. [24]

    High-recall causal discovery for autocorrelated time series with latent confounders.Advances in neural information processing systems, 33:12615–12625, 2020

    Andreas Gerhardus and Jakob Runge. High-recall causal discovery for autocorrelated time series with latent confounders.Advances in neural information processing systems, 33:12615–12625, 2020

  25. [25]

    Review of causal discovery methods based on graphical models.Frontiers in Genetics, 10:524, 2019

    Clark Glymour, Kun Zhang, and Peter Spirtes. Review of causal discovery methods based on graphical models.Frontiers in Genetics, 10:524, 2019

  26. [26]

    Causal discovery from temporal data: An overview and new perspectives.ACM Computing Surveys, 57 (4):1–38, 2024

    Chang Gong, Chuzhe Zhang, Di Yao, Jingping Bi, Wenbin Li, and Yongjun Xu. Causal discovery from temporal data: An overview and new perspectives.ACM Computing Surveys, 57 (4):1–38, 2024

  27. [27]

    Investigating causal relations by econometric models and cross-spectral methods.Econometrica: journal of the Econometric Society, pages 424–438, 1969

    Clive WJ Granger. Investigating causal relations by econometric models and cross-spectral methods.Econometrica: journal of the Econometric Society, pages 424–438, 1969

  28. [28]

    Causaldynamics: A large-scale benchmark for structural discovery of dynamical causal models.arXiv preprint arXiv:2505.16620, 2025

    Benjamin Herdeanu, Juan Nathaniel, Carla Roesch, Jatan Buch, Gregor Ramien, Johannes Haux, and Pierre Gentine. Causaldynamics: A large-scale benchmark for structural discovery of dynamical causal models.arXiv preprint arXiv:2505.16620, 2025

  29. [29]

    Identification of time-dependent causal model: A gaussian process treatment

    Biwei Huang, Kun Zhang, and Bernhard Schölkopf. Identification of time-dependent causal model: A gaussian process treatment. InIJCAI, pages 3561–3568, 2015

  30. [30]

    Causal discovery and forecasting in nonstationary environments with state-space models

    Biwei Huang, Kun Zhang, Mingming Gong, and Clark Glymour. Causal discovery and forecasting in nonstationary environments with state-space models. InInternational conference on machine learning, pages 2901–2910. Pmlr, 2019

  31. [31]

    Causal discovery from heterogeneous/nonstationary data.Journal of Machine Learning Research, 21(89):1–53, 2020

    Biwei Huang, Kun Zhang, Jiji Zhang, Joseph Ramsey, Ruben Sanchez-Romero, Clark Glymour, and Bernhard Schölkopf. Causal discovery from heterogeneous/nonstationary data.Journal of Machine Learning Research, 21(89):1–53, 2020

  32. [32]

    Causal discovery from subsampled time series data by constraint optimization

    Antti Hyttinen, Sergey Plis, Matti Järvisalo, Frederick Eberhardt, and David Danks. Causal discovery from subsampled time series data by constraint optimization. InConference on Probabilistic Graphical Models, pages 216–227. PMLR, 2016

  33. [33]

    Estimation of a structural vector autoregression model using non-gaussianity.Journal of Machine Learning Research, 11 (5), 2010

    Aapo Hyvärinen, Kun Zhang, Shohei Shimizu, and Patrik O Hoyer. Estimation of a structural vector autoregression model using non-gaussianity.Journal of Machine Learning Research, 11 (5), 2010. 11

  34. [34]

    Efficient Causal Graph Discovery Using Large Language Models

    Thomas Jiralerspong, Xiaoyin Chen, Yash More, Vedant Shah, and Yoshua Bengio. Efficient causal graph discovery using large language models.arXiv preprint arXiv:2402.01207, 2024

  35. [35]

    Extensive chaos in the lorenz-96 model.Chaos: An interdisciplinary journal of nonlinear science, 20(4), 2010

    Alireza Karimi and Mark R Paul. Extensive chaos in the lorenz-96 model.Chaos: An interdisciplinary journal of nonlinear science, 20(4), 2010

  36. [36]

    A logic for causal inference in time series with discrete and continuous variables

    Samantha Kleinberg. A logic for causal inference in time series with discrete and continuous variables. InIJCAI Proceedings-International Joint Conference on Artificial Intelligence, volume 22, page 943, 2011

  37. [37]

    Improving bayesian network structure learning in the presence of measurement error.Journal of Machine Learning Research, 23(324): 1–28, 2022

    Yang Liu, Anthony C Constantinou, and Zhigao Guo. Improving bayesian network structure learning in the presence of measurement error.Journal of Machine Learning Research, 23(324): 1–28, 2022

  38. [38]

    Position: The causal revolution needs scientific pragmatism.arXiv preprint arXiv:2406.02275, 2024

    Joshua Loftus. Position: The causal revolution needs scientific pragmatism.arXiv preprint arXiv:2406.02275, 2024

  39. [39]

    Robustness of algorithms for causal structure learning to hyperparameter choice

    Damian Machlanski, Spyridon Samothrakis, and Paul S Clarke. Robustness of algorithms for causal structure learning to hyperparameter choice. InCausal Learning and Reasoning, pages 703–739. PMLR, 2024

  40. [40]

    Causal structure learning from multivariate time series in settings with unmeasured confounding

    Daniel Malinsky and Peter Spirtes. Causal structure learning from multivariate time series in settings with unmeasured confounding. InProceedings of 2018 ACM SIGKDD workshop on causal discovery, pages 23–47. PMLR, 2018

  41. [41]

    Spacetime: Causal discovery from non-stationary time series

    Sarah Mameche, Lénaïg Cornanguer, Urmi Ninad, and Jilles Vreeken. Spacetime: Causal discovery from non-stationary time series. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 19405–19413, 2025

  42. [42]

    Interpretable models for granger causality using self-explaining neural networks.arXiv preprint arXiv:2101.07600, 2021

    Riˇcards Marcinkevi ˇcs and Julia E V ogt. Interpretable models for granger causality using self-explaining neural networks.arXiv preprint arXiv:2101.07600, 2021

  43. [43]

    Assumption violations in causal discovery and the robustness of score matching

    Francesco Montagna, Atalanti Mastakouri, Elias Eulig, Nicoletta Noceti, Lorenzo Rosasco, Dominik Janzing, Bryon Aragam, and Francesco Locatello. Assumption violations in causal discovery and the robustness of score matching. InAdvances in Neural Information Processing Systems, volume 36, 2023

  44. [44]

    Causal discovery with attention-based convo- lutional neural networks.Machine Learning and Knowledge Extraction, 1(1):19, 2019

    Meike Nauta, Doina Bucur, and Christin Seifert. Causal discovery with attention-based convo- lutional neural networks.Machine Learning and Knowledge Extraction, 1(1):19, 2019

  45. [45]

    On the role of sparsity and DAG constraints for learning linear DAGs

    Ignavier Ng, AmirEmad Ghassami, and Kun Zhang. On the role of sparsity and DAG constraints for learning linear DAGs. InAdvances in Neural Information Processing Systems, volume 33, pages 17943–17954, 2020

  46. [46]

    Structure learning with continuous optimization: A sober look and beyond

    Ignavier Ng, Biwei Huang, and Kun Zhang. Structure learning with continuous optimization: A sober look and beyond. InCausal Learning and Reasoning, pages 71–105. PMLR, 2024

  47. [47]

    DYNOTEARS: Structure learning from time-series data

    Roxana Pamfil, Nisara Sriwattanaworachai, Shaan Desai, Philip Pilgerstorfer, Konstantinos Georgatzis, Paul Beaumont, and Bryon Aragam. DYNOTEARS: Structure learning from time-series data. InInternational Conference on Artificial Intelligence and Statistics, pages 1595–1605. PMLR, 2020

  48. [48]

    Cambridge university press, 2009

    Judea Pearl.Causality. Cambridge university press, 2009

  49. [49]

    Basic books, 2018

    Judea Pearl and Dana Mackenzie.The Book of Why: The New Science of Cause and Effect. Basic books, 2018

  50. [50]

    The MIT Press, 2017

    Jonas Peters, Dominik Janzing, and Bernhard Schölkopf.Elements of Causal Inference: Foundations and Learning Algorithms. The MIT Press, 2017

  51. [51]

    Position: Causal machine learning requires rigorous synthetic experiments for broader adoption.arXiv preprint arXiv:2508.08883, 2025

    Audrey Poinsot, Panayiotis Panayiotou, Alessandro Leite, Nicolas Chesneau, Özgür ¸ Sim¸ sek, and Marc Schoenauer. Position: Causal machine learning requires rigorous synthetic experiments for broader adoption.arXiv preprint arXiv:2508.08883, 2025. 12

  52. [52]

    Comparison of strategies for scalable causal discovery of latent variable models from mixed data.International journal of data science and analytics, 6(1):33–45, 2018

    Vineet K Raghu, Joseph D Ramsey, Alison Morris, Dimitrios V Manatakis, Peter Sprites, Panos K Chrysanthis, Clark Glymour, and Panayiotis V Benos. Comparison of strategies for scalable causal discovery of latent variable models from mixed data.International journal of data science and analytics, 6(1):33–45, 2018

  53. [53]

    Beware of the simulated DAG! causal discovery benchmarks may be easy to game

    Alexander Reisach, Christof Seiler, and Sebastian Weichwald. Beware of the simulated DAG! causal discovery benchmarks may be easy to game. InAdvances in Neural Information Processing Systems, volume 34, pages 27772–27784, 2021

  54. [54]

    Causal network reconstruction from time series: From theoretical assumptions to practical estimation.Chaos: An Interdisciplinary Journal of Nonlinear Science, 28(7), 2018

    Jakob Runge. Causal network reconstruction from time series: From theoretical assumptions to practical estimation.Chaos: An Interdisciplinary Journal of Nonlinear Science, 28(7), 2018

  55. [55]

    Discovering contemporaneous and lagged causal relations in autocorrelated nonlinear time series datasets

    Jakob Runge. Discovering contemporaneous and lagged causal relations in autocorrelated nonlinear time series datasets. InConference on uncertainty in artificial intelligence, pages 1388–1397. Pmlr, 2020

  56. [56]

    Detecting and quantifying causal associations in large nonlinear time series datasets.Science advances, 5 (11):eaau4996, 2019

    Jakob Runge, Peer Nowack, Marlene Kretschmer, Seth Flaxman, and Dino Sejdinovic. Detecting and quantifying causal associations in large nonlinear time series datasets.Science advances, 5 (11):eaau4996, 2019

  57. [57]

    Causal inference for time series.Nature Reviews Earth & Environment, 4(7):487–505, 2023

    Jakob Runge, Andreas Gerhardus, Gherardo Varando, Veronika Eyring, and Gustau Camps- Valls. Causal inference for time series.Nature Reviews Earth & Environment, 4(7):487–505, 2023

  58. [58]

    Causal discovery from non- stationary time series.International Journal of Data Science and Analytics, 19(1):33–59, 2025

    Agathe Sadeghi, Achintya Gopal, and Mohammad Fesanghary. Causal discovery from non- stationary time series.International Journal of Data Science and Analytics, 19(1):33–59, 2025

  59. [59]

    Measurement error and causal discovery

    Richard Scheines and Joseph Ramsey. Measurement error and causal discovery. InCEUR workshop proceedings, volume 1792, page 1, 2017

  60. [60]

    A linear non-Gaussian acyclic model for causal discovery.Journal of Machine Learning Research, 7(10), 2006

    Shohei Shimizu, Patrik O Hoyer, Aapo Hyvärinen, Antti Kerminen, and Michael Jordan. A linear non-Gaussian acyclic model for causal discovery.Journal of Machine Learning Research, 7(10), 2006

  61. [61]

    Granger causality: A review and recent advances.Annual Review of Statistics and Its Application, 9(1):289–319, 2022

    Ali Shojaie and Emily B Fox. Granger causality: A review and recent advances.Annual Review of Statistics and Its Application, 9(1):289–319, 2022

  62. [62]

    An algorithm for fast recovery of sparse causal graphs.Social science computer review, 9(1):62–72, 1991

    Peter Spirtes and Clark Glymour. An algorithm for fast recovery of sparse causal graphs.Social science computer review, 9(1):62–72, 1991

  63. [63]

    MIT press, 2001

    Peter Spirtes, Clark Glymour, and Richard Scheines.Causation, Prediction, and Search. MIT press, 2001

  64. [64]

    Causalrivers– scaling up benchmarking of causal discovery for real-world time-series.arXiv preprint arXiv:2503.17452, 2025

    Gideon Stein, Maha Shadaydeh, Jan Blunk, Niklas Penzel, and Joachim Denzler. Causalrivers– scaling up benchmarking of causal discovery for real-world time-series.arXiv preprint arXiv:2503.17452, 2025

  65. [65]

    TCD-arena: Assessing robustness of time series causal discovery methods against assumption violations

    Gideon Stein, Niklas Penzel, Tristan Piater, and Joachim Denzler. TCD-arena: Assessing robustness of time series causal discovery methods against assumption violations. InThe Fourteenth International Conference on Learning Representations, 2026. URL https:// openreview.net/forum?id=MtdrOCLAGY

  66. [66]

    Detecting causality in complex ecosystems.science, 338(6106):496–500, 2012

    George Sugihara, Robert May, Hao Ye, Chih-hao Hsieh, Ethan Deyle, Michael Fogarty, and Stephan Munch. Detecting causality in complex ecosystems.science, 338(6106):496–500, 2012

  67. [67]

    Nts-notears: Learning nonpara- metric dbns with prior knowledge.arXiv preprint arXiv:2109.04286, 2021

    Xiangyu Sun, Oliver Schulte, Guiliang Liu, and Pascal Poupart. Nts-notears: Learning nonpara- metric dbns with prior knowledge.arXiv preprint arXiv:2109.04286, 2021

  68. [68]

    Detecting strange attractors in turbulence

    Floris Takens. Detecting strange attractors in turbulence. InDynamical Systems and Turbulence, Warwick 1980: proceedings of a symposium held at the University of Warwick 1979/80, pages 366–381. Springer, 2006. 13

  69. [69]

    Neural granger causality

    Alex Tank, Ian Covert, Nicholas Foti, Ali Shojaie, and Emily B Fox. Neural granger causality. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(8):4267–4279, 2021

  70. [70]

    Constraint- based causal discovery with mixed data.International journal of data science and analytics, 6 (1):19–30, 2018

    Michail Tsagris, Giorgos Borboudakis, Vincenzo Lagani, and Ioannis Tsamardinos. Constraint- based causal discovery with mixed data.International journal of data science and analytics, 6 (1):19–30, 2018

  71. [71]

    Causal discovery in the presence of missing data

    Ruibo Tu, Cheng Zhang, Paul Ackermann, Karthika Mohan, Hedvig Kjellström, and Kun Zhang. Causal discovery in the presence of missing data. InThe 22nd International Conference on Artificial Intelligence and Statistics, pages 1762–1770. PMLR, 2019

  72. [72]

    Causal discovery from incomplete data: a deep learning approach.arXiv preprint arXiv:2001.05343, 2020

    Yuhao Wang, Vlado Menkovski, Hao Wang, Xin Du, and Mykola Pechenizkiy. Causal discovery from incomplete data: a deep learning approach.arXiv preprint arXiv:2001.05343, 2020

  73. [73]

    Mixed causal structure discovery with application to prescriptive pricing

    Wei Wenjuan, Feng Lu, and Liu Chunchen. Mixed causal structure discovery with application to prescriptive pricing. InProceedings of the 27th International Joint Conference on Artificial Intelligence, pages 5126–5134, 2018

  74. [74]

    Generalized independent noise condition for estimating latent variable causal graphs

    Feng Xie, Ruichu Cai, Biwei Huang, Clark Glymour, Zhifeng Hao, and Kun Zhang. Generalized independent noise condition for estimating latent variable causal graphs. InAdvances in Neural Information Processing Systems, volume 33, pages 14891–14902, 2020

  75. [75]

    Causal discovery in linear latent variable models subject to measurement error

    Yuqin Yang, AmirEmad Ghassami, Mohamed Nafea, Negar Kiyavash, Kun Zhang, and Ilya Shpitser. Causal discovery in linear latent variable models subject to measurement error. Advances in Neural Information Processing Systems, 35:874–886, 2022

  76. [76]

    Causal Discovery in Linear Models with Unobserved Variables and Measurement Error

    Yuqin Yang, Mohamed Nafea, Negar Kiyavash, Kun Zhang, and AmirEmad Ghassami. Causal discovery in linear models with unobserved variables and measurement error.arXiv preprint arXiv:2407.19426, 2024

  77. [77]

    The robustness of differentiable causal discovery in misspecified scenarios

    Huiyang Yi, Yanyan He, Duxin Chen, Mingyu Kang, He Wang, and Wenwu Yu. The robustness of differentiable causal discovery in misspecified scenarios. InThe Thirteenth International Conference on Learning Representations, 2025

  78. [78]

    Causal discovery with missing data in a multicentric clinical study

    Alessio Zanga, Alice Bernasconi, Peter JF Lucas, Hanny Pijnenborg, Casper Reijnen, Marco Scutari, and Fabio Stella. Causal discovery with missing data in a multicentric clinical study. In International Conference on Artificial Intelligence in Medicine, pages 40–44. Springer, 2023

  79. [79]

    Federated causal discovery with missing data in a multicentric study on endometrial cancer.Journal of biomedical informatics, page 104877, 2025

    Alessio Zanga, Alice Bernasconi, Peter JF Lucas, Hanny Pijnenborg, Casper Reijnen, Marco Scutari, and Anthony C Constantinou. Federated causal discovery with missing data in a multicentric study on endometrial cancer.Journal of biomedical informatics, page 104877, 2025

  80. [80]

    Causal discovery for linear mixed data

    Yan Zeng, Shohei Shimizu, Hidetoshi Matsui, and Fuchun Sun. Causal discovery for linear mixed data. InConference on Causal Learning and Reasoning, pages 994–1009. PMLR, 2022

Showing first 80 references.