RealCause: Realistic Causal Inference Benchmarking

Brady Neal; Chin-Wei Huang; Sunand Raghupathi

arxiv: 2011.15007 · v2 · pith:ZBJ7FZC2new · submitted 2020-11-30 · 💻 cs.LG · cs.AI· stat.ML

RealCause: Realistic Causal Inference Benchmarking

Brady Neal , Chin-Wei Huang , Sunand Raghupathi This is my paper

classification 💻 cs.LG cs.AIstat.ML

keywords causalestimatorsdataground-truthbenchmarkbestchoosedifferent

0 comments

read the original abstract

There are many different causal effect estimators in causal inference. However, it is unclear how to choose between these estimators because there is no ground-truth for causal effects. A commonly used option is to simulate synthetic data, where the ground-truth is known. However, the best causal estimators on synthetic data are unlikely to be the best causal estimators on real data. An ideal benchmark for causal estimators would both (a) yield ground-truth values of the causal effects and (b) be representative of real data. Using flexible generative models, we provide a benchmark that both yields ground-truth and is realistic. Using this benchmark, we evaluate over 1500 different causal estimators and provide evidence that it is rational to choose hyperparameters for causal estimators using predictive metrics.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Causal Foundation Models with Continuous Treatments
cs.LG 2026-05 unverdicted novelty 8.0

A transformer foundation model is trained on synthetic data from a novel prior over continuous-treatment data-generating processes to predict treatment-response curves via in-context learning without task-specific fin...
TabPFN-3: Technical Report
cs.LG 2026-05 unverdicted novelty 6.0

TabPFN-3 delivers state-of-the-art tabular prediction performance on benchmarks up to 1M rows, is up to 20x faster than prior versions, and introduces test-time scaling that beats non-TabPFN models by hundreds of Elo points.
TabPFN-3: Technical Report
cs.LG 2026-05 unverdicted novelty 6.0

TabPFN-3 scales tabular foundation models to 1M rows with synthetic pretraining, test-time compute, and benchmark-leading performance on tabular, relational, and tabular-text tasks while being up to 20x faster than Ta...
TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models
cs.LG 2025-11 unverdicted novelty 6.0

TabPFN-2.5 scales tabular foundation models to 20x larger datasets, outperforms tuned tree models on TabArena, achieves near-perfect win rates against default XGBoost, and adds a distillation engine for fast productio...