pith. sign in

arxiv: 2601.23068 · v2 · pith:IA2R6FTNnew · submitted 2026-01-30 · 💻 cs.LG · cs.AI

ExplainerPFN: Towards tabular foundation models for model-free zero-shot feature importance estimations

Pith reviewed 2026-05-21 13:37 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords feature attributionShapley valueszero-shot learningtabular datafoundation modelsmodel interpretabilitystructural causal modelsTabPFN
0
0 comments X

The pith

A model pretrained on synthetic causal data can estimate Shapley-style feature attributions for tabular inputs without any access to the prediction model or reference examples.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper asks whether feature attributions can be obtained in a zero-shot setting using only the observed data distribution. Because many models can produce the same predictions while implying different attributions, the approach learns the posterior mean attribution under a prior induced by meta-training on synthetic structural causal models. ExplainerPFN implements this idea as a tabular foundation model built on TabPFN that directly outputs attributions for new datasets. The method therefore supplies a principled attribution in exactly the regimes where conventional explainers cannot be applied because model access is unavailable. Experiments indicate that the resulting attributions are competitive with surrogate explainers that require two to ten reference SHAP evaluations.

Core claim

ExplainerPFN is a zero-shot tabular foundation model pretrained on synthetic structural causal datasets supervised by exact or near-exact Shapley values; once trained it predicts feature attributions for any unseen tabular dataset using only the input data and without model queries, gradients, or example explanations.

What carries the argument

ExplainerPFN, a TabPFN-based network that maps raw tabular inputs directly to posterior-mean Shapley-style attributions after meta-training on synthetic SCM data.

If this is right

  • Few-shot surrogate explainers reach high SHAP fidelity with as few as two reference observations.
  • Zero-shot Shapley-style attributions become available in settings that prohibit model access or gradient queries.
  • An open-source training pipeline and synthetic data generator are provided for further development.
  • The performance gap to few-shot baselines remains small across both real and synthetic tabular benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could be applied to black-box prediction APIs where even limited queries are costly or restricted.
  • If the synthetic prior transfers, similar zero-shot pipelines might be trained for other attribution functionals beyond Shapley values.
  • Privacy-sensitive deployments could benefit because no model weights or training data need to be shared with the explainer.
  • Direct comparison against ground-truth causal feature importance on datasets with known structures would test whether the learned attributions recover the underlying generative roles.

Load-bearing premise

The posterior mean attributions learned from synthetic structural causal models remain useful and stable when applied to real tabular datasets whose generative processes may differ from the training distribution.

What would settle it

On a collection of real tabular datasets whose causal structures are known to diverge from the synthetic training distribution, the attributions produced by ExplainerPFN differ substantially from model-access SHAP values computed with a large number of samples.

read the original abstract

Computing the importance of features in supervised classification tasks is critical for model interpretability. Shapley values are a widely used approach for explaining model predictions, but require direct access to the underlying model, an assumption frequently violated in real-world deployments. We investigate whether meaningful feature attributions can be obtained in a zero-shot setting, using only the input data distribution and no evaluations of the target model. Because multiple models can produce identical predictions yet yield different Shapley decompositions, the mapping from data to attributions is not uniquely identifiable. We therefore target attributions that are "true to the data" rather than "true to the model", learning a posterior mean attribution under a meta-training prior. To this end, we introduce ExplainerPFN, a tabular foundation model built on TabPFN, pretrained on synthetic structural causal datasets supervised with exact or near-exact Shapley values, that predicts feature attributions for unseen tabular datasets without model access, gradients, or example explanations. Our contributions are fourfold: (1) we show that few-shot surrogate explainers achieve high SHAP fidelity with as few as two reference observations; (2) we propose ExplainerPFN, the first zero-shot method for estimating Shapley-value-style feature attributions without access to the underlying model or reference explanations, providing a principled attribution where no existing explainer can be applied; (3) we release an open-source implementation including the full training pipeline and synthetic data generator; and (4) through extensive experiments on real and synthetic datasets, we show that ExplainerPFN achieves performance competitive with few-shot surrogate explainers that rely on 2-10 SHAP examples.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces ExplainerPFN, a tabular foundation model extending TabPFN that is meta-trained on synthetic structural causal model (SCM) datasets with exact or near-exact Shapley-value supervision. The central claim is that this yields a zero-shot, model-free estimator of feature attributions that is competitive with few-shot surrogate explainers (using 2-10 SHAP reference examples) on both real and synthetic tabular classification tasks, providing attributions where no existing explainer applies because it targets attributions true to the data distribution rather than to any specific model.

Significance. If the generalization result holds, the work would be significant for enabling principled feature importance estimation in black-box or inaccessible-model settings (e.g., API-only deployments or privacy-constrained environments). The meta-training approach to resolve non-identifiability via a posterior mean under an SCM prior is conceptually clean, and the open-source release of the full training pipeline and synthetic generator is a concrete strength that supports reproducibility.

major comments (2)
  1. [§5.2] §5.2 (real-data experiments): the reported competitiveness with 2-10-shot surrogates is the load-bearing result, yet the evaluation provides no explicit OOD controls, distribution-shift metrics, or ablations that vary the synthetic SCM generator's causal structure, noise model, or marginals relative to the real test sets; without these, it remains possible that performance is benchmark-specific rather than evidence of robust zero-shot transfer.
  2. [§3.1] §3.1 (meta-training prior): the posterior-mean attribution is defined with respect to the synthetic SCM family, but no sensitivity analysis or quantitative measure of mismatch (e.g., Wasserstein distance on feature dependencies or interaction patterns) is given between the training generator and real tabular distributions; this directly affects whether the learned mapping can be expected to remain stable on data whose generative processes lie outside the training family.
minor comments (2)
  1. [§2] Notation for the posterior mean attribution could be introduced earlier and used consistently; the abstract's phrasing 'true to the data' is intuitive but would benefit from a one-sentence formal gloss in §2.
  2. [Tables 2-3] Table captions and axis labels in the real-data results should explicitly state the number of SHAP references used by the surrogate baselines so readers can directly compare the 2-10-shot regime.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and for highlighting the potential significance of ExplainerPFN for model-free attribution settings. Below we respond point-by-point to the major comments, indicating where we will strengthen the manuscript through targeted revisions.

read point-by-point responses
  1. Referee: [§5.2] §5.2 (real-data experiments): the reported competitiveness with 2-10-shot surrogates is the load-bearing result, yet the evaluation provides no explicit OOD controls, distribution-shift metrics, or ablations that vary the synthetic SCM generator's causal structure, noise model, or marginals relative to the real test sets; without these, it remains possible that performance is benchmark-specific rather than evidence of robust zero-shot transfer.

    Authors: We agree that the absence of explicit OOD controls and generator ablations leaves open the possibility that results are partly benchmark-specific. In the revised manuscript we will add a new subsection under §5.2 that (i) reports distribution-shift metrics (Wasserstein distance on marginals and pairwise dependencies, plus MMD) between the synthetic training distribution and each real test set, (ii) introduces controlled distribution-shift experiments by subsampling or perturbing real datasets to induce covariate and concept shifts, and (iii) performs ablations that systematically vary the SCM generator’s causal graph density, noise variance, and marginal types while measuring downstream attribution fidelity on held-out real data. These additions will directly test robustness of the zero-shot transfer. revision: yes

  2. Referee: [§3.1] §3.1 (meta-training prior): the posterior-mean attribution is defined with respect to the synthetic SCM family, but no sensitivity analysis or quantitative measure of mismatch (e.g., Wasserstein distance on feature dependencies or interaction patterns) is given between the training generator and real tabular distributions; this directly affects whether the learned mapping can be expected to remain stable on data whose generative processes lie outside the training family.

    Authors: We concur that a quantitative characterization of the prior-to-real mismatch is necessary to support claims of stable generalization. We will insert a new paragraph and accompanying figure in §3.1 (and cross-reference it in the experiments) that computes and reports Wasserstein distances on both marginals and pairwise interaction patterns between the meta-training SCM ensemble and the real tabular collections used in §5. We will further include a sensitivity sweep that retrains ExplainerPFN on deliberately mismatched SCM families (e.g., denser graphs, heavier-tailed noise) and evaluates the resulting attribution stability on the original real test sets. This analysis will make the scope and limitations of the learned posterior mean explicit. revision: yes

Circularity Check

0 steps flagged

No significant circularity; meta-training prior is external to test data

full rationale

The paper derives ExplainerPFN by pretraining a TabPFN-based model on synthetic structural causal models using exact or near-exact Shapley supervision, then applies the resulting posterior-mean predictor zero-shot to real tabular datasets. This constitutes a learned mapping from data distribution to attributions rather than any equation or parameter that reduces to its own inputs by construction. No self-definitional steps, fitted-input predictions, or load-bearing self-citations appear in the abstract or described contributions; the meta-training prior is generated externally and the competitiveness claim rests on empirical experiments rather than tautology. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach assumes that a posterior mean attribution under a meta-training prior on synthetic causal data is a well-defined and useful target. No new physical entities are postulated. The main free parameters are those internal to the TabPFN backbone and any hyperparameters of the synthetic data generator.

free parameters (1)
  • TabPFN hyperparameters and training schedule
    Standard foundation-model training choices that are fitted or selected on the synthetic meta-training distribution.
axioms (1)
  • domain assumption Synthetic structural causal models generate data distributions whose Shapley attributions are representative of real tabular data.
    Invoked when claiming generalization from synthetic pretraining to real datasets.

pith-pipeline@v0.9.0 · 5830 in / 1312 out tokens · 31444 ms · 2026-05-21T13:37:01.970341+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. ConfoundingSHAP: Quantifying confounding strength in causal inference

    cs.LG 2026-05 unverdicted novelty 7.0

    ConfoundingSHAP defines a custom Shapley game to attribute confounding strength to individual covariates and uses TabPFN to estimate it scalably without exhaustive refitting.

  2. TabPFN-3: Technical Report

    cs.LG 2026-05 unverdicted novelty 6.0

    TabPFN-3 delivers state-of-the-art tabular prediction performance on benchmarks up to 1M rows, is up to 20x faster than prior versions, and introduces test-time scaling that beats non-TabPFN models by hundreds of Elo points.