TabCausal is a causal discovery foundation model pretrained across diverse synthetic causal environments that reports better macro-averaged performance than baselines on both synthetic and LLM-audited semantic benchmarks.
Arrow: A Foundation Model for Causal Discovery
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
We introduce Arrow, a foundation model for zero-shot causal discovery on observational tabular data. Arrow factorizes a directed acyclic graph into an undirected skeleton and a topological order, guaranteeing acyclicity by construction. Given a new dataset, it uses a transformer-based architecture to contextualize variables within and across observations, then predicts skeleton edge probabilities and node order scores that together define a graph. Arrow is trained in a supervised fashion on synthetic datasets with ground-truth graphs, using an end-to-end differentiable directed edge composite likelihood induced by the skeleton-order factorization. The training distribution spans diverse graph families, functional forms, noise models, and dataset shapes. Across in- and out-of-distribution synthetic, semi-synthetic, and real datasets, Arrow matches or outperforms existing causal discovery methods at substantially lower inference cost than competitive alternatives. Our results demonstrate that large-scale pretraining on diverse synthetic data can yield zero-shot causal discovery models that are fast, accurate, and reusable on new datasets.
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
TabCausal: Pretraining Across Causal Environments for Tabular Causal Discovery
TabCausal is a causal discovery foundation model pretrained across diverse synthetic causal environments that reports better macro-averaged performance than baselines on both synthetic and LLM-audited semantic benchmarks.