ExplainerPFN: Towards tabular foundation models for model-free zero-shot feature importance estimations

Joao Fonseca; Julia Stoyanovich

arxiv: 2601.23068 · v2 · pith:IA2R6FTNnew · submitted 2026-01-30 · 💻 cs.LG · cs.AI

ExplainerPFN: Towards tabular foundation models for model-free zero-shot feature importance estimations

Joao Fonseca , Julia Stoyanovich This is my paper

Pith reviewed 2026-05-21 13:37 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords feature attributionShapley valueszero-shot learningtabular datafoundation modelsmodel interpretabilitystructural causal modelsTabPFN

0 comments

The pith

A model pretrained on synthetic causal data can estimate Shapley-style feature attributions for tabular inputs without any access to the prediction model or reference examples.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper asks whether feature attributions can be obtained in a zero-shot setting using only the observed data distribution. Because many models can produce the same predictions while implying different attributions, the approach learns the posterior mean attribution under a prior induced by meta-training on synthetic structural causal models. ExplainerPFN implements this idea as a tabular foundation model built on TabPFN that directly outputs attributions for new datasets. The method therefore supplies a principled attribution in exactly the regimes where conventional explainers cannot be applied because model access is unavailable. Experiments indicate that the resulting attributions are competitive with surrogate explainers that require two to ten reference SHAP evaluations.

Core claim

ExplainerPFN is a zero-shot tabular foundation model pretrained on synthetic structural causal datasets supervised by exact or near-exact Shapley values; once trained it predicts feature attributions for any unseen tabular dataset using only the input data and without model queries, gradients, or example explanations.

What carries the argument

ExplainerPFN, a TabPFN-based network that maps raw tabular inputs directly to posterior-mean Shapley-style attributions after meta-training on synthetic SCM data.

If this is right

Few-shot surrogate explainers reach high SHAP fidelity with as few as two reference observations.
Zero-shot Shapley-style attributions become available in settings that prohibit model access or gradient queries.
An open-source training pipeline and synthetic data generator are provided for further development.
The performance gap to few-shot baselines remains small across both real and synthetic tabular benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could be applied to black-box prediction APIs where even limited queries are costly or restricted.
If the synthetic prior transfers, similar zero-shot pipelines might be trained for other attribution functionals beyond Shapley values.
Privacy-sensitive deployments could benefit because no model weights or training data need to be shared with the explainer.
Direct comparison against ground-truth causal feature importance on datasets with known structures would test whether the learned attributions recover the underlying generative roles.

Load-bearing premise

The posterior mean attributions learned from synthetic structural causal models remain useful and stable when applied to real tabular datasets whose generative processes may differ from the training distribution.

What would settle it

On a collection of real tabular datasets whose causal structures are known to diverge from the synthetic training distribution, the attributions produced by ExplainerPFN differ substantially from model-access SHAP values computed with a large number of samples.

read the original abstract

Computing the importance of features in supervised classification tasks is critical for model interpretability. Shapley values are a widely used approach for explaining model predictions, but require direct access to the underlying model, an assumption frequently violated in real-world deployments. We investigate whether meaningful feature attributions can be obtained in a zero-shot setting, using only the input data distribution and no evaluations of the target model. Because multiple models can produce identical predictions yet yield different Shapley decompositions, the mapping from data to attributions is not uniquely identifiable. We therefore target attributions that are "true to the data" rather than "true to the model", learning a posterior mean attribution under a meta-training prior. To this end, we introduce ExplainerPFN, a tabular foundation model built on TabPFN, pretrained on synthetic structural causal datasets supervised with exact or near-exact Shapley values, that predicts feature attributions for unseen tabular datasets without model access, gradients, or example explanations. Our contributions are fourfold: (1) we show that few-shot surrogate explainers achieve high SHAP fidelity with as few as two reference observations; (2) we propose ExplainerPFN, the first zero-shot method for estimating Shapley-value-style feature attributions without access to the underlying model or reference explanations, providing a principled attribution where no existing explainer can be applied; (3) we release an open-source implementation including the full training pipeline and synthetic data generator; and (4) through extensive experiments on real and synthetic datasets, we show that ExplainerPFN achieves performance competitive with few-shot surrogate explainers that rely on 2-10 SHAP examples.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ExplainerPFN gives a zero-shot route to data-driven feature attributions by meta-training TabPFN on synthetic causal models, but the real-data transfer still looks like the main open question.

read the letter

The paper's core move is to treat feature attributions as something you can predict directly from the data distribution alone, without ever querying the target model. They do this by pretraining a TabPFN-style model on synthetic structural causal models where they can compute exact or near-exact Shapley values as supervision, then apply the resulting network to new tabular datasets in one shot. That is genuinely new; prior work either needs model access or at least a handful of reference explanations. They also release the full training pipeline and data generator, which removes one common barrier to checking the claims. On the positive side, their experiments show that even simple few-shot surrogate explainers reach decent fidelity with only two to ten SHAP examples, which is a useful calibration point and not something everyone had quantified before. The reported competitiveness on both synthetic and real tables is the main empirical claim. The soft spot is exactly where the stress-test note points: the training distribution is a family of synthetic SCMs, and real tabular data often contains non-causal correlations, heterogeneous noise, or interaction patterns outside that family. The abstract says they ran experiments on real datasets, but without explicit distribution-shift controls or ablations that vary the generator mismatch, it is difficult to tell whether the competitive numbers are robust or tied to how closely their test sets resemble the training prior. That is the load-bearing assumption for the zero-shot claim. This work is aimed at interpretability researchers who deal with tabular data and frequently lack model access. A reader who wants to try a model-free baseline or who is already using TabPFN-style foundation models will get the most out of it. The idea is coherent on its own terms and the open-source release makes it checkable, so it deserves a serious referee. I would send it to review with instructions to focus on the generalization experiments and any OOD diagnostics they can add.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces ExplainerPFN, a tabular foundation model extending TabPFN that is meta-trained on synthetic structural causal model (SCM) datasets with exact or near-exact Shapley-value supervision. The central claim is that this yields a zero-shot, model-free estimator of feature attributions that is competitive with few-shot surrogate explainers (using 2-10 SHAP reference examples) on both real and synthetic tabular classification tasks, providing attributions where no existing explainer applies because it targets attributions true to the data distribution rather than to any specific model.

Significance. If the generalization result holds, the work would be significant for enabling principled feature importance estimation in black-box or inaccessible-model settings (e.g., API-only deployments or privacy-constrained environments). The meta-training approach to resolve non-identifiability via a posterior mean under an SCM prior is conceptually clean, and the open-source release of the full training pipeline and synthetic generator is a concrete strength that supports reproducibility.

major comments (2)

[§5.2] §5.2 (real-data experiments): the reported competitiveness with 2-10-shot surrogates is the load-bearing result, yet the evaluation provides no explicit OOD controls, distribution-shift metrics, or ablations that vary the synthetic SCM generator's causal structure, noise model, or marginals relative to the real test sets; without these, it remains possible that performance is benchmark-specific rather than evidence of robust zero-shot transfer.
[§3.1] §3.1 (meta-training prior): the posterior-mean attribution is defined with respect to the synthetic SCM family, but no sensitivity analysis or quantitative measure of mismatch (e.g., Wasserstein distance on feature dependencies or interaction patterns) is given between the training generator and real tabular distributions; this directly affects whether the learned mapping can be expected to remain stable on data whose generative processes lie outside the training family.

minor comments (2)

[§2] Notation for the posterior mean attribution could be introduced earlier and used consistently; the abstract's phrasing 'true to the data' is intuitive but would benefit from a one-sentence formal gloss in §2.
[Tables 2-3] Table captions and axis labels in the real-data results should explicitly state the number of SHAP references used by the surrogate baselines so readers can directly compare the 2-10-shot regime.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and for highlighting the potential significance of ExplainerPFN for model-free attribution settings. Below we respond point-by-point to the major comments, indicating where we will strengthen the manuscript through targeted revisions.

read point-by-point responses

Referee: [§5.2] §5.2 (real-data experiments): the reported competitiveness with 2-10-shot surrogates is the load-bearing result, yet the evaluation provides no explicit OOD controls, distribution-shift metrics, or ablations that vary the synthetic SCM generator's causal structure, noise model, or marginals relative to the real test sets; without these, it remains possible that performance is benchmark-specific rather than evidence of robust zero-shot transfer.

Authors: We agree that the absence of explicit OOD controls and generator ablations leaves open the possibility that results are partly benchmark-specific. In the revised manuscript we will add a new subsection under §5.2 that (i) reports distribution-shift metrics (Wasserstein distance on marginals and pairwise dependencies, plus MMD) between the synthetic training distribution and each real test set, (ii) introduces controlled distribution-shift experiments by subsampling or perturbing real datasets to induce covariate and concept shifts, and (iii) performs ablations that systematically vary the SCM generator’s causal graph density, noise variance, and marginal types while measuring downstream attribution fidelity on held-out real data. These additions will directly test robustness of the zero-shot transfer. revision: yes
Referee: [§3.1] §3.1 (meta-training prior): the posterior-mean attribution is defined with respect to the synthetic SCM family, but no sensitivity analysis or quantitative measure of mismatch (e.g., Wasserstein distance on feature dependencies or interaction patterns) is given between the training generator and real tabular distributions; this directly affects whether the learned mapping can be expected to remain stable on data whose generative processes lie outside the training family.

Authors: We concur that a quantitative characterization of the prior-to-real mismatch is necessary to support claims of stable generalization. We will insert a new paragraph and accompanying figure in §3.1 (and cross-reference it in the experiments) that computes and reports Wasserstein distances on both marginals and pairwise interaction patterns between the meta-training SCM ensemble and the real tabular collections used in §5. We will further include a sensitivity sweep that retrains ExplainerPFN on deliberately mismatched SCM families (e.g., denser graphs, heavier-tailed noise) and evaluates the resulting attribution stability on the original real test sets. This analysis will make the scope and limitations of the learned posterior mean explicit. revision: yes

Circularity Check

0 steps flagged

No significant circularity; meta-training prior is external to test data

full rationale

The paper derives ExplainerPFN by pretraining a TabPFN-based model on synthetic structural causal models using exact or near-exact Shapley supervision, then applies the resulting posterior-mean predictor zero-shot to real tabular datasets. This constitutes a learned mapping from data distribution to attributions rather than any equation or parameter that reduces to its own inputs by construction. No self-definitional steps, fitted-input predictions, or load-bearing self-citations appear in the abstract or described contributions; the meta-training prior is generated externally and the competitiveness claim rests on empirical experiments rather than tautology. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach assumes that a posterior mean attribution under a meta-training prior on synthetic causal data is a well-defined and useful target. No new physical entities are postulated. The main free parameters are those internal to the TabPFN backbone and any hyperparameters of the synthetic data generator.

free parameters (1)

TabPFN hyperparameters and training schedule
Standard foundation-model training choices that are fitted or selected on the synthetic meta-training distribution.

axioms (1)

domain assumption Synthetic structural causal models generate data distributions whose Shapley attributions are representative of real tabular data.
Invoked when claiming generalization from synthetic pretraining to real datasets.

pith-pipeline@v0.9.0 · 5830 in / 1312 out tokens · 31444 ms · 2026-05-21T13:37:01.970341+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce ExplainerPFN, a tabular foundation model built on TabPFN that is pretrained on synthetic datasets generated from random structural causal models and supervised using exact or near-exact Shapley values.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we show that ExplainerPFN achieves performance competitive with few-shot surrogate explainers that rely on 2–10 SHAP examples

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

ConfoundingSHAP: Quantifying confounding strength in causal inference
cs.LG 2026-05 unverdicted novelty 7.0

ConfoundingSHAP defines a custom Shapley game to attribute confounding strength to individual covariates and uses TabPFN to estimate it scalably without exhaustive refitting.
TabPFN-3: Technical Report
cs.LG 2026-05 unverdicted novelty 6.0

TabPFN-3 delivers state-of-the-art tabular prediction performance on benchmarks up to 1M rows, is up to 20x faster than prior versions, and introduces test-time scaling that beats non-TabPFN models by hundreds of Elo points.