A transformer foundation model is trained on synthetic data from a novel prior over continuous-treatment data-generating processes to predict treatment-response curves via in-context learning without task-specific fine-tuning.
hub Mixed citations
TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models
Mixed citation behavior. Most common role is background (56%).
abstract
The first tabular foundation model, TabPFN, and its successor TabPFNv2 have impacted tabular AI substantially, with dozens of methods building on it and hundreds of applications across different use cases. This report introduces TabPFN-2.5, the next generation of our tabular foundation model, built for datasets with up to 50,000 data points and 2,000 features, a 20x increase in data cells compared to TabPFNv2. TabPFN-2.5 is now the leading method for the industry standard benchmark TabArena (which contains datasets with up to 100,000 training data points), substantially outperforming tuned tree-based models and matching the accuracy of AutoGluon 1.4, a complex four-hour tuned ensemble that even includes the previous TabPFNv2. Remarkably, default TabPFN-2.5 has a 100% win rate against default XGBoost on small to medium-sized classification datasets (<=10,000 data points, 500 features) and a 87% win rate on larger datasets up to 100K samples and 2K features (85% for regression). For production use cases, we introduce a new distillation engine that converts TabPFN-2.5 into a compact MLP or tree ensemble, preserving most of its accuracy while delivering orders-of-magnitude lower latency and plug-and-play deployment. This new release will immediately strengthen the performance of the many applications and methods already built on the TabPFN ecosystem.
hub tools
citation-role summary
citation-polarity summary
years
2026 47representative citing papers
A new corpus of 108 mixed string-numeric tables shows that advanced tabular learners with basic string embeddings perform well on most real-world data, while large LLM encoders help on free-text heavy tables.
Tabular foundation models excel on tiny- to medium-sized IID data but are outperformed by traditional tree-based and deep learning models on non-IID, large, and high-dimensional datasets, based on evaluations across 11 models and 142 datasets in the new BeyondArena benchmark.
Tabular foundation models fail to classify statistically matched legal vs rule-violating database states, achieving only chance accuracy without rule-derived audits.
DINOv3-PCA-TabPFN outperforms prior detectors like LATTE in low-data and cross-generator transfer settings for AI image detection.
The paper delivers the first theoretical analysis and practical zeroth-order framework for algorithmic recourse under in-context learning for tabular prediction.
CalArena is a large-scale benchmark that evaluates dozens of post-hoc calibration methods using Post-Hoc Improvement (PHI) in proper scoring rules and finds that smooth functions outperform binning while dedicated multiclass methods are required in high-dimensional settings.
Transformers can implement gradient descent targeting posterior predictive mean and variance followed by binning to approximate PPDs in-context for Gaussian process regression.
WSADBench unifies WSAD evaluation across three supervision types, runs 700K experiments on 36 algorithms and 4 modalities, and finds strong correlations between scenarios plus performance boundaries favoring general models except in extreme label scarcity.
TabPFN-MT is a multitask in-context learner for tabular data that sets a new state-of-the-art on deep multitask learning for datasets under 1000 samples while reducing inference cost from O(T) to O(1) passes.
SurvivalPFN amortizes Bayesian survival analysis for right-censored data by pretraining a prior-data fitted network on synthetic identifiable DGPs and then performing in-context inference, achieving competitive results on 61 real datasets.
MulTaBench is a new collection of 40 image-tabular and text-tabular datasets designed to test target-aware representation tuning in multimodal tabular models.
A prior-data fitted network amortizes causal sensitivity analysis by generating training labels via Lagrangian scalarization, achieving orders-of-magnitude faster bounds computation than per-instance methods.
Tabular foundation models show substantial depthwise redundancy, so a looped single-layer version achieves comparable results with 20% of the original parameters.
Schema-1 is the first Data Language Model that natively understands raw tabular data and outperforms gradient-boosted ensembles, AutoML, and prior tabular foundation models on row-level prediction and imputation tasks.
TFM-Retouche is an architecture-agnostic input-space residual adapter that improves tabular foundation model accuracy on 51 datasets by learning input corrections through the frozen backbone, with an identity guard to fall back to the original model.
TabDistill distills feature interactions from tabular foundation models via post-hoc attribution and inserts them into GAMs, yielding consistent predictive gains.
LUCoS replaces raw tabular geometry with unsupervised PFN latent embeddings for medoid-based context selection and ranks first on mean AUC, ACC, and F1 across 67 datasets and six budgets.
PIT-CP post-processes nonconformity scores via one-dimensional conditional density estimation to produce approximately pivotal scores, achieving approximate conditional coverage in conformal prediction for i.i.d. data.
LLMTabBench evaluates LLMs on zero- and few-shot binary tabular classification and reports that zero-shot can outperform few-shot due to example conflicts with model priors while performance drops beyond a complexity threshold.
ProxySHAP approximates higher-order Shapley and Banzhaf interactions via tree proxies plus residual correction and a polynomial-time interventional TreeSHAP generalization for tree ensembles.
O'Prior, a compositional synthetic prior with hierarchical SCMs, realism engines, stress modules, and curriculum protocols, improves tabular foundation model accuracy and robustness on real benchmarks when architecture and compute are held fixed.
Distilling TabICLv2 into XGBoost via stratified OOF labeling yields 0.882 macro-mean AUC (96.5% of teacher) at 1.9 ms CPU across 153 datasets, with significant gains over tuned CatBoost on low-dimensional data.
KGPFN pretrains on multiple KGs to learn relation patterns, then performs query-specific reasoning by encoding local context with NBFNet and global context via retrieved instances aggregated in a PFN with feature- and sample-level attention.
citing papers explorer
-
Causal Foundation Models with Continuous Treatments
A transformer foundation model is trained on synthetic data from a novel prior over continuous-treatment data-generating processes to predict treatment-response curves via in-context learning without task-specific fine-tuning.
-
STRABLE: Benchmarking Tabular Machine Learning with Strings
A new corpus of 108 mixed string-numeric tables shows that advanced tabular learners with basic string embeddings perform well on most real-world data, while large LLM encoders help on free-text heavy tables.
-
Beyond IID: How General Are Tabular Foundation Models, Really?
Tabular foundation models excel on tiny- to medium-sized IID data but are outperformed by traditional tree-based and deep learning models on non-IID, large, and high-dimensional datasets, based on evaluations across 11 models and 142 datasets in the new BeyondArena benchmark.
-
Statistically Indistinguishable, Operationally Distinct: A Formal Barrier for Tabular Foundation Models
Tabular foundation models fail to classify statistically matched legal vs rule-violating database states, achieving only chance accuracy without rule-derived audits.
-
Images as Tables: In-Context Learning with TabPFN for Low-Data Detection of AI-Generated Images
DINOv3-PCA-TabPFN outperforms prior detectors like LATTE in low-data and cross-generator transfer settings for AI image detection.
-
Algorithmic Recourse of In-Context Learning for Tabular Data
The paper delivers the first theoretical analysis and practical zeroth-order framework for algorithmic recourse under in-context learning for tabular prediction.
-
CalArena: A Large-Scale Post-Hoc Calibration Benchmark
CalArena is a large-scale benchmark that evaluates dozens of post-hoc calibration methods using Post-Hoc Improvement (PHI) in proper scoring rules and finds that smooth functions outperform binning while dedicated multiclass methods are required in high-dimensional settings.
-
Transformers Can Learn Posterior Predictive Distributions In-Context
Transformers can implement gradient descent targeting posterior predictive mean and variance followed by binning to approximate PPDs in-context for Gaussian process regression.
-
Rethinking Weak Supervision in Anomaly Detection: A Comprehensive Benchmark
WSADBench unifies WSAD evaluation across three supervision types, runs 700K experiments on 36 algorithms and 4 modalities, and finds strong correlations between scenarios plus performance boundaries favoring general models except in extreme label scarcity.
-
TabPFN-MT: A Natively Multitask In-Context Learner for Tabular Data
TabPFN-MT is a multitask in-context learner for tabular data that sets a new state-of-the-art on deep multitask learning for datasets under 1000 samples while reducing inference cost from O(T) to O(1) passes.
-
SurvivalPFN: Amortizing Survival Prediction via In-Context Bayesian Inference
SurvivalPFN amortizes Bayesian survival analysis for right-censored data by pretraining a prior-data fitted network on synthetic identifiable DGPs and then performing in-context inference, achieving competitive results on 61 real datasets.
-
MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image
MulTaBench is a new collection of 40 image-tabular and text-tabular datasets designed to test target-aware representation tuning in multimodal tabular models.
-
Amortizing Causal Sensitivity Analysis via Prior Data-Fitted Networks
A prior-data fitted network amortizes causal sensitivity analysis by generating training labels via Lagrangian scalarization, achieving orders-of-magnitude faster bounds computation than per-instance methods.
-
Is One Layer Enough? Understanding Inference Dynamics in Tabular Foundation Models
Tabular foundation models show substantial depthwise redundancy, so a looped single-layer version achieves comparable results with 20% of the original parameters.
-
Data Language Models: A New Foundation Model Class for Tabular Data
Schema-1 is the first Data Language Model that natively understands raw tabular data and outperforms gradient-boosted ensembles, AutoML, and prior tabular foundation models on row-level prediction and imputation tasks.
-
TFM-Retouche: A Lightweight Input-Space Adapter for Tabular Foundation Models
TFM-Retouche is an architecture-agnostic input-space residual adapter that improves tabular foundation model accuracy on 51 datasets by learning input corrections through the frozen backbone, with an identity guard to fall back to the original model.
-
Selecting Feature Interactions for Generalized Additive Models by Distilling Foundation Models
TabDistill distills feature interactions from tabular foundation models via post-hoc attribution and inserts them into GAMs, yielding consistent predictive gains.
-
LUCoS: Latent Unsupervised Context Selection for Tabular Foundation Models
LUCoS replaces raw tabular geometry with unsupervised PFN latent embeddings for medoid-based context selection and ranks first on mean AUC, ACC, and F1 across 67 datasets and six budgets.
-
A Post-Processing Conformal Prediction Approach for Conditional Coverage via Pivotal Scores
PIT-CP post-processes nonconformity scores via one-dimensional conditional density estimation to produce approximately pivotal scores, achieving approximate conditional coverage in conformal prediction for i.i.d. data.
-
LLMTabBench: Evaluating LLMs on Binary Tabular Classification From Zero to Few Shots
LLMTabBench evaluates LLMs on zero- and few-shot binary tabular classification and reports that zero-shot can outperform few-shot due to example conflicts with model priors while performance drops beyond a complexity threshold.
-
Proxy-Based Approximation of Shapley and Banzhaf Interactions
ProxySHAP approximates higher-order Shapley and Banzhaf interactions via tree proxies plus residual correction and a polynomial-time interventional TreeSHAP generalization for tree ensembles.
-
Shaping the Prior: How Synthetic Task Distributions Determine Tabular Foundation Model Quality
O'Prior, a compositional synthetic prior with hierarchical SCMs, realism engines, stress modules, and curriculum protocols, improves tabular foundation model accuracy and robustness on real benchmarks when architecture and compute are held fixed.
-
Pocket Foundation Models: Distilling TFMs into CPU-Ready Gradient-Boosted Trees
Distilling TabICLv2 into XGBoost via stratified OOF labeling yields 0.882 macro-mean AUC (96.5% of teacher) at 1.9 ms CPU across 153 datasets, with significant gains over tuned CatBoost on low-dimensional data.
-
KGPFN: Unlocking the Potential of Knowledge Graph Foundation Model via In-Context Learning
KGPFN pretrains on multiple KGs to learn relation patterns, then performs query-specific reasoning by encoding local context with NBFNet and global context via retrieved instances aggregated in a PFN with feature- and sample-level attention.
-
FETS Benchmark: Foundation Models Outperform Dataset-specific Machine Learning in Energy Time Series Forecasting
Foundation models outperform dataset-specific machine learning in energy time series forecasting across 54 datasets in 9 categories.
-
Tabular foundation models for in-context prediction of molecular properties
Tabular foundation models achieve high accuracy in molecular property prediction through in-context learning, with up to 100% win rates on MoleculeACE tasks when paired with CheMeleon embeddings.
-
Benchmarking Optimizers for MLPs in Tabular Deep Learning
Muon optimizer outperforms AdamW across 17 tabular datasets when training MLPs under a shared protocol.
-
From Uniform to Learned Knots: A Study of Spline-Based Numerical Encodings for Tabular Deep Learning
Spline encodings for numerical features show task-dependent performance in tabular deep learning, with piecewise-linear encoding robust for classification and variable results for regression depending on spline family, knot strategy, and backbone.
-
FEAT: A Linear-Complexity Foundation Model for Extremely Large Structured Data
FEAT is a linear-complexity structured data foundation model using dual-axis encoding, AFBM state-space models, and Conv-GLA to achieve O(N) scaling and permutation invariance while outperforming prior SFMs on real-world benchmarks.
-
Exploring Differences Between Tabular Enterprise Data and Public Benchmarks
Enterprise tabular data differs from public benchmarks in ways that prevent good generalization of models like TabPFN, TabICL, and ConTextTab between the two domains.
-
GOTabPFN: From Feature Ordering to Compact Tokenization for Tabular Foundation Models on High-Dimensional Data
GOTabPFN combines GO-LR ordering (equivalent to weighted minimum linear arrangement) and NSC compression to enable practical TabPFN-style prediction on HDLSS tabular data under tight token budgets, improving stability and accuracy.
-
Modular Multimodal Classification Without Fine-Tuning: A Simple Compositional Approach
CoMET achieves strong multimodal classification performance by composing frozen modality encoders, PCA compression, and tabular foundation models without any training, reaching state-of-the-art on diverse benchmarks including large-scale hierarchical tasks.
-
Distilling Tabular Foundation Models for Structured Health Data
Leakage-aware distillation transfers at least 90% of tabular foundation model AUC to lightweight students across 19 health datasets, with 26x CPU speedup and preserved calibration/fairness.
-
Ensembling Tabular Foundation Models - A Diversity Ceiling And A Calibration Trap
Six modern tabular foundation models are near-redundant, limiting ensemble gains to +0.18% accuracy at high cost while some methods degrade calibration.
-
Foundation Models for Credit Risk Prediction: A Game Changer?
Tabular foundation models outperform standard methods in credit risk PD and LGD tasks, with larger gains on smaller datasets when used out-of-the-box.
-
Imitation learning for clinical decision support in pediatric ECMO
TabPFN outperforms XGBoost and MLPs when learning action models from real-world pediatric ECMO observational data for decision support.
-
VIP-COP: Context Optimization for Tabular Foundation Models
VIP-COP is a black-box method that optimizes context for tabular foundation models by ranking and selecting high-value samples and features via online KernelSHAP regression, outperforming baselines on large high-dimensional data.
-
Tabular Foundation Model for Generative Modelling
TabFORGE generates high-quality synthetic tabular data by leveraging pretrained causality-aware representations in a two-stage diffusion-decoder architecture that mitigates latent distribution shifts.
-
TabCF: Distributional Control Function Estimation with Tabular Foundation Models
TabCF is a tuning-light method using tabular foundation models for control function regression to estimate distributional causal effects such as interventional means and quantiles.
-
Heterogeneous Scientific Foundation Model Collaboration
Eywa enables language-based agentic AI systems to collaborate with specialized scientific foundation models for improved performance on structured data tasks.
-
Analog Optical Inference on Million-Record Mortgage Data
Analog optical inference on 5.84 million mortgage records achieves 94.6% balanced accuracy, with gaps traced to encoding and architecture rather than hardware non-idealities.
-
PRAGMA: Revolut Foundation Model
PRAGMA pre-trains a Transformer on heterogeneous banking events with a tailored self-supervised masked objective, yielding embeddings that support strong downstream performance on credit scoring, fraud detection, and lifetime value prediction using linear heads or light fine-tuning.
-
ConceptTracer: Interactive Analysis of Concept Saliency and Selectivity in Neural Representations
ConceptTracer supplies an interactive interface and saliency/selectivity metrics to locate concept-responsive neurons in neural representations, shown on TabPFN.
-
Tabular foundation models for robust calibration of near-infrared chemical sensing data
Preprocessing-optimized TabPFN achieves top average rank in regression on 66 NIR datasets and remains competitive on outliers and extrapolation compared to PLS, Ridge, CatBoost, and CNN-1D.
-
Optimizing IoT Intrusion Detection with Tabular Foundation Models for Smart City Forensics
TabPFNv2.5 delivers 40x faster inference than Random Forest at 97% binary accuracy on TON IoT data, enabling a hybrid pipeline for real-time IoT threat screening in smart cities.
-
Noise Immunity in In-Context Tabular Learning: An Empirical Robustness Analysis of TabPFN's Attention Mechanisms
TabPFN maintains high ROC-AUC and structured attention under controlled additions of irrelevant features, nonlinear correlations, and mislabeled targets in binary classification.
-
Correcting Class Imbalance in Prior-Data Fitted Networks for Tabular Classification
Thresholding and downsampling effectively mitigate class imbalance in PFNs for tabular classification due to their calibration and limited-data strengths.