CauScale: Neural Causal Discovery at Scale

Bo Peng; Chaochao Lu; Jiaguo Tian; Sirui Chen; Yu Qiao

arxiv: 2602.08629 · v2 · pith:CB2M5SRInew · submitted 2026-02-09 · 💻 cs.LG · cs.AI· stat.ML

CauScale: Neural Causal Discovery at Scale

Bo Peng , Sirui Chen , Jiaguo Tian , Yu Qiao , Chaochao Lu This is my paper

classification 💻 cs.LG cs.AIstat.ML

keywords causcaledatacausaldiscoverygraphgraphsscalesattention

0 comments

read the original abstract

Causal discovery is essential for advancing data-driven fields such as scientific AI and data analysis, yet existing approaches face significant time- and space-efficiency bottlenecks when scaling to large graphs. To address this challenge, we present CauScale, a neural architecture designed for efficient causal discovery that scales inference to graphs with up to 1000 nodes. CauScale improves time efficiency via a reduction unit that compresses data embeddings and improves space efficiency by adopting tied attention weights to avoid maintaining axis-specific attention maps. To keep high causal discovery accuracy, CauScale adopts a two-stream design: a data stream extracts relational evidence from high-dimensional observations, while a graph stream integrates statistical graph priors and preserves key structural signals. CauScale successfully scales to 500-node graphs during training, where prior work fails due to space limitations. Across testing data with varying graph scales and causal mechanisms, CauScale achieves 99.6% mAP on in-distribution data and 84.4% on out-of-distribution data, while delivering 4-13,000 times inference speedups over prior methods. Our project page is at https://github.com/OpenCausaLab/CauScale.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

TabCausal: Pretraining Across Causal Environments for Tabular Causal Discovery
cs.LG 2026-05 unverdicted novelty 5.0

TabCausal is a causal discovery foundation model pretrained across diverse synthetic causal environments that reports better macro-averaged performance than baselines on both synthetic and LLM-audited semantic benchmarks.