Efficient Causal Graph Discovery Using Large Language Models

Thomas Jiralerspong, Vedant Shah, Xiaoyin Chen, Yash More, Yoshua Bengio

Authors on Pith no claims yet

classification 💻 cs.LG cs.AIstat.ME

keywords causalproposeddiscoveryframeworkgraphapproachgraphsmethod

read the original abstract

We propose a novel framework that leverages LLMs for full causal graph discovery. While previous LLM-based methods have used a pairwise query approach, this requires a quadratic number of queries which quickly becomes impractical for larger causal graphs. In contrast, the proposed framework uses a breadth-first search (BFS) approach which allows it to use only a linear number of queries. We also show that the proposed method can easily incorporate observational data when available, to improve performance. In addition to being more time and data-efficient, the proposed framework achieves state-of-the-art results on real-world causal graphs of varying sizes. The results demonstrate the effectiveness and efficiency of the proposed method in discovering causal relationships, showcasing its potential for broad applicability in causal graph discovery tasks across different domains.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

TCD-Arena: Assessing Robustness of Time Series Causal Discovery Methods Against Assumption Violations
cs.LG 2026-05 unverdicted novelty 7.0

TCD-Arena is a new customizable testing framework that runs millions of experiments to map how 33 different assumption violations affect time series causal discovery methods and shows ensembles can boost overall robustness.
Beyond the Parameters: A Technical Survey of Contextual Enrichment in Large Language Models: From In-Context Prompting to Causal Retrieval-Augmented Generation
cs.CL 2026-04 unverdicted novelty 4.0

The survey unifies LLM augmentation techniques along the single axis of structured context supplied at inference time and supplies a literature screening protocol plus deployment decision framework.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages · cited by 2 Pith papers · 3 internal anchors

[1]

Lmpriors: Pre-trained language models as task-specific priors.arXiv preprint arXiv: 2210.12530,

Kristy Choi, Chris Cundy, Sanjari Srivastava, and Stefano Ermon. Lmpriors: Pre-trained language models as task-specific priors.arXiv preprint arXiv: 2210.12530,

work page arXiv
[2]

arXiv preprint arXiv:2305.19555 , year=

Ga¨el Gendron, Qiming Bao, Michael Witbrock, and Gillian Dobbie. Large language models are not strong abstract reasoners.arXiv preprint arXiv: 2305.19555,

work page arXiv
[3]

Mathprompter: Mathematical reasoning using large language models

doi: 10.48550/arXiv.2303.05398. Emre Kıcıman, Robert Ness, Amit Sharma, and Chenhao Tan. Causal reasoning and large language models: Opening a new frontier for causality.arXiv preprint arXiv: 2305.00050,

work page doi:10.48550/arxiv.2303.05398
[4]

Prompting large language models for counterfactual generation: An empirical study

ISSN 00359246. URL http://www.jstor.org/stable/ 2345762. Yongqi Li, Mayi Xu, Xin Miao, Shen Zhou, and Tieyun Qian. Large language models as counterfac- tual generator: Strengths and weaknesses.arXiv preprint arXiv: 2305.14791,

work page arXiv
[5]

Causal discovery with language models as imperfect experts, 2023a

Stephanie Long, Alexandre Pich ´e, Valentina Zantedeschi, Tibor Schuster, and Alexandre Drouin. Causal discovery with language models as imperfect experts, 2023a. Stephanie Long, Tibor Schuster, Alexandre Pich´e, Department of Family Medicine, McGill University, Mila, Universit´e de Montreal, and ServiceNow Research. Can large language models build causal...

work page arXiv
[6]

GPT-4 Technical Report

doi: 10.1184/R1/22696393.v1. URL https://kilthub.cmu.edu/articles/ thesis/Graphical_Models_Selecting_causal_and_statistical_models/ 22696393. OpenAI. Gpt-4 technical report.arXiv preprint arXiv: 2303.08774,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1184/r1/22696393.v1
[7]

2009.Causality(2 ed.)

ISBN 978-0-521-89560-6. doi: 10.1017/CBO9780511803161. Jonas Peters, Dominik Janzing, and Bernhard Schlkopf.Elements of Causal Inference: Foundations and Learning Algorithms. The MIT Press,

work page doi:10.1017/cbo9780511803161
[8]

ISBN 0262037319. Baptiste Rozi`ere, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, J´er´emy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre D´efossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, N...

work page internal anchor Pith review arXiv
[9]

doi: 10.18637/jss.v035.i03. David J. Spiegelhalter, A. Philip Dawid, Steffen L. Lauritzen, and Robert G. Cowell. Bayesian analysis in expert systems.Statistical Science, 8(3):219–247,

work page doi:10.18637/jss.v035.i03
[10]

URL http://www.jstor.org/stable/2245959

ISSN 08834237. URL http://www.jstor.org/stable/2245959. Peter Spirtes and Clark Glymour. An algorithm for fast recovery of sparse causal graphs.Social Science Computer Review, 9(1):62–72,

work page arXiv
[11]

An algorithm for fast recovery of sparse causal graphs.Social Science Computer Review, 9(1):62–72, 1991

doi: 10.1177/089443939100900106. URL https: //doi.org/10.1177/089443939100900106. 9 Efficient Causal Graph Discovery Using LLMs Ruibo Tu, Kun Zhang, B. Bertilson, H. Kjellstr¨om, and Cheng Zhang. Neuropathic pain diagnosis simulator for causal discovery algorithm evaluation.Neural Information Processing Systems,

work page doi:10.1177/089443939100900106
[12]

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

URL https://openreview.net/forum?id= WBXbRs63oVu. Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models.arXiv preprint arXiv: 2201.11903,

work page internal anchor Pith review arXiv
[13]

Yudong Xu, Wenhao Li, Pashootan Vaezipoor, Scott Sanner, and Elias B. Khalil. Llms and the abstrac- tion and reasoning corpus: Successes, failures, and the importance of object-based representations. arXiv preprint arXiv: 2305.18354,

work page arXiv
[14]

Large language models as commonsense knowl- edge for large-scale task planning

Zirui Zhao, Wee Sun Lee, and David Hsu. Large language models as commonsense knowledge for large-scale task planning.arXiv preprint arXiv: 2305.14078,

work page arXiv
[15]

Causal-learn: Causal discovery in python.arXiv preprint arXiv:2307.16405,

Yujia Zheng, Biwei Huang, Wei Chen, Joseph Ramsey, Mingming Gong, Ruichu Cai, Shohei Shimizu, Peter Spirtes, and Kun Zhang. Causal-learn: Causal discovery in python.arXiv preprint arXiv:2307.16405,

work page arXiv
[16]

We omit the results for GES and pairwise queries because they are intractable to use on a graph of this size

0.033 0.14 0.040 0.063 214 0.059 0.063 0.94 LLM Methods Pairwise N/A N/A N/A N/A N/A N/A N/A N/A Ours 0.217 0.583 0.2510.351331 0.014 0.0220.643 Table 4: Results on the Neuropathic Pain causal graph (221 nodes, 770 edges). We omit the results for GES and pairwise queries because they are intractable to use on a graph of this size. All methods except the p...

work page arXiv