A transformer foundation model is trained on synthetic data from a novel prior over continuous-treatment data-generating processes to predict treatment-response curves via in-context learning without task-specific fine-tuning.
hub Baseline reference
TabICLv2: A better, faster, scalable, and open tabular foundation model.arXiv preprint arXiv:2602.11139
Baseline reference. 57% of citing Pith papers use this work as a benchmark or comparison.
hub tools
citation-role summary
citation-polarity summary
years
2026 29representative citing papers
A new corpus of 108 mixed string-numeric tables shows that advanced tabular learners with basic string embeddings perform well on most real-world data, while large LLM encoders help on free-text heavy tables.
Tabular foundation models excel on tiny- to medium-sized IID data but are outperformed by traditional tree-based and deep learning models on non-IID, large, and high-dimensional datasets, based on evaluations across 11 models and 142 datasets in the new BeyondArena benchmark.
FlexTab shows a shared encoder with task-specific decoders trained on unlabeled tables can achieve SOTA on classification, regression, anomaly detection and entity matching while staying competitive on relational entity classification.
CalArena is a large-scale benchmark that evaluates dozens of post-hoc calibration methods using Post-Hoc Improvement (PHI) in proper scoring rules and finds that smooth functions outperform binning while dedicated multiclass methods are required in high-dimensional settings.
SurvivalPFN amortizes Bayesian survival analysis for right-censored data by pretraining a prior-data fitted network on synthetic identifiable DGPs and then performing in-context inference, achieving competitive results on 61 real datasets.
MulTaBench is a new collection of 40 image-tabular and text-tabular datasets designed to test target-aware representation tuning in multimodal tabular models.
PFN-TS converts PFN posterior predictives into mean-reward samples for Thompson sampling using a subsampled predictive CLT, with consistency proofs, regret bounds, and strong empirical performance on synthetic and real bandit benchmarks.
Schema-1 is the first Data Language Model that natively understands raw tabular data and outperforms gradient-boosted ensembles, AutoML, and prior tabular foundation models on row-level prediction and imputation tasks.
TFM-Retouche is an architecture-agnostic input-space residual adapter that improves tabular foundation model accuracy on 51 datasets by learning input corrections through the frozen backbone, with an identity guard to fall back to the original model.
RamanBench unifies 74 datasets into the first large-scale reproducible benchmark for ML on Raman spectra, finding tabular foundation models outperform baselines but no method generalizes across datasets.
TabPATE applies a PATE-style private aggregation to synthetic tabular queries generated from feature ranges, enabling private in-context learning with near-random membership inference success while keeping competitive utility.
LUCoS replaces raw tabular geometry with unsupervised PFN latent embeddings for medoid-based context selection and ranks first on mean AUC, ACC, and F1 across 67 datasets and six budgets.
O'Prior, a compositional synthetic prior with hierarchical SCMs, realism engines, stress modules, and curriculum protocols, improves tabular foundation model accuracy and robustness on real benchmarks when architecture and compute are held fixed.
Distilling TabICLv2 into XGBoost via stratified OOF labeling yields 0.882 macro-mean AUC (96.5% of teacher) at 1.9 ms CPU across 153 datasets, with significant gains over tuned CatBoost on low-dimensional data.
KGPFN pretrains on multiple KGs to learn relation patterns, then performs query-specific reasoning by encoding local context with NBFNet and global context via retrieved instances aggregated in a PFN with feature- and sample-level attention.
OSCBO adaptively balances Gaussian process sharpness and calibration in Bayesian optimization by casting hyperparameter selection as constrained online learning, while preserving sublinear regret bounds.
FICBO pretrains a feedback-aware transformer with a structured prior on feedback distortion to adaptively exploit or ignore unreliable auxiliary signals during in-context black-box optimization.
Tabular foundation models achieve high accuracy in molecular property prediction through in-context learning, with up to 100% win rates on MoleculeACE tasks when paired with CheMeleon embeddings.
Muon optimizer outperforms AdamW across 17 tabular datasets when training MLPs under a shared protocol.
KumoRFM-2 pre-trains on synthetic and real relational data across row, column, foreign-key and cross-sample axes, injects task information early, and achieves up to 8% gains over supervised baselines on 41 benchmarks in few-shot and fine-tuned regimes while handling billion-scale datasets.
Enterprise tabular data differs from public benchmarks in ways that prevent good generalization of models like TabPFN, TabICL, and ConTextTab between the two domains.
A pose-estimation plus tabular foundation model pipeline trained on 25 adults transfers to 12 pediatric hyperkinetic movement disorder cases with lightweight final-layer calibration, raising Hamming accuracy from 0.804 to 0.839 and Jaccard index from 0.548 to 0.633 on held-out patients.
Oracle Markov boundaries improve prediction on high-dimensional sparse tabular data but causal discovery pipelines rarely recover boundaries that beat using all features.
citing papers explorer
-
Causal Foundation Models with Continuous Treatments
A transformer foundation model is trained on synthetic data from a novel prior over continuous-treatment data-generating processes to predict treatment-response curves via in-context learning without task-specific fine-tuning.
-
STRABLE: Benchmarking Tabular Machine Learning with Strings
A new corpus of 108 mixed string-numeric tables shows that advanced tabular learners with basic string embeddings perform well on most real-world data, while large LLM encoders help on free-text heavy tables.
-
Beyond IID: How General Are Tabular Foundation Models, Really?
Tabular foundation models excel on tiny- to medium-sized IID data but are outperformed by traditional tree-based and deep learning models on non-IID, large, and high-dimensional datasets, based on evaluations across 11 models and 142 datasets in the new BeyondArena benchmark.
-
FlexTab: A Flexible Encoder-Decoder Architecture for In-Context Learning Across Diverse Tabular Tasks
FlexTab shows a shared encoder with task-specific decoders trained on unlabeled tables can achieve SOTA on classification, regression, anomaly detection and entity matching while staying competitive on relational entity classification.
-
CalArena: A Large-Scale Post-Hoc Calibration Benchmark
CalArena is a large-scale benchmark that evaluates dozens of post-hoc calibration methods using Post-Hoc Improvement (PHI) in proper scoring rules and finds that smooth functions outperform binning while dedicated multiclass methods are required in high-dimensional settings.
-
SurvivalPFN: Amortizing Survival Prediction via In-Context Bayesian Inference
SurvivalPFN amortizes Bayesian survival analysis for right-censored data by pretraining a prior-data fitted network on synthetic identifiable DGPs and then performing in-context inference, achieving competitive results on 61 real datasets.
-
MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image
MulTaBench is a new collection of 40 image-tabular and text-tabular datasets designed to test target-aware representation tuning in multimodal tabular models.
-
PFN-TS: Thompson Sampling for Contextual Bandits via Prior-Data Fitted Networks
PFN-TS converts PFN posterior predictives into mean-reward samples for Thompson sampling using a subsampled predictive CLT, with consistency proofs, regret bounds, and strong empirical performance on synthetic and real bandit benchmarks.
-
Data Language Models: A New Foundation Model Class for Tabular Data
Schema-1 is the first Data Language Model that natively understands raw tabular data and outperforms gradient-boosted ensembles, AutoML, and prior tabular foundation models on row-level prediction and imputation tasks.
-
TFM-Retouche: A Lightweight Input-Space Adapter for Tabular Foundation Models
TFM-Retouche is an architecture-agnostic input-space residual adapter that improves tabular foundation model accuracy on 51 datasets by learning input corrections through the frozen backbone, with an identity guard to fall back to the original model.
-
RamanBench: A Large-Scale Benchmark for Machine Learning on Raman Spectroscopy
RamanBench unifies 74 datasets into the first large-scale reproducible benchmark for ML on Raman spectra, finding tabular foundation models outperform baselines but no method generalizes across datasets.
-
TabPATE: Differentially Private Tabular In-Context Learning Without Public Data
TabPATE applies a PATE-style private aggregation to synthetic tabular queries generated from feature ranges, enabling private in-context learning with near-random membership inference success while keeping competitive utility.
-
LUCoS: Latent Unsupervised Context Selection for Tabular Foundation Models
LUCoS replaces raw tabular geometry with unsupervised PFN latent embeddings for medoid-based context selection and ranks first on mean AUC, ACC, and F1 across 67 datasets and six budgets.
-
Shaping the Prior: How Synthetic Task Distributions Determine Tabular Foundation Model Quality
O'Prior, a compositional synthetic prior with hierarchical SCMs, realism engines, stress modules, and curriculum protocols, improves tabular foundation model accuracy and robustness on real benchmarks when architecture and compute are held fixed.
-
Pocket Foundation Models: Distilling TFMs into CPU-Ready Gradient-Boosted Trees
Distilling TabICLv2 into XGBoost via stratified OOF labeling yields 0.882 macro-mean AUC (96.5% of teacher) at 1.9 ms CPU across 153 datasets, with significant gains over tuned CatBoost on low-dimensional data.
-
KGPFN: Unlocking the Potential of Knowledge Graph Foundation Model via In-Context Learning
KGPFN pretrains on multiple KGs to learn relation patterns, then performs query-specific reasoning by encoding local context with NBFNet and global context via retrieved instances aggregated in a PFN with feature- and sample-level attention.
-
Online Sharp-Calibrated Bayesian Optimization
OSCBO adaptively balances Gaussian process sharpness and calibration in Bayesian optimization by casting hyperparameter selection as constrained online learning, while preserving sublinear regret bounds.
-
In-Context Black-Box Optimization with Unreliable Feedback
FICBO pretrains a feedback-aware transformer with a structured prior on feedback distortion to adaptively exploit or ignore unreliable auxiliary signals during in-context black-box optimization.
-
Tabular foundation models for in-context prediction of molecular properties
Tabular foundation models achieve high accuracy in molecular property prediction through in-context learning, with up to 100% win rates on MoleculeACE tasks when paired with CheMeleon embeddings.
-
Benchmarking Optimizers for MLPs in Tabular Deep Learning
Muon optimizer outperforms AdamW across 17 tabular datasets when training MLPs under a shared protocol.
-
KumoRFM-2: Scaling Foundation Models for Relational Learning
KumoRFM-2 pre-trains on synthetic and real relational data across row, column, foreign-key and cross-sample axes, injects task information early, and achieves up to 8% gains over supervised baselines on 41 benchmarks in few-shot and fine-tuned regimes while handling billion-scale datasets.
-
Exploring Differences Between Tabular Enterprise Data and Public Benchmarks
Enterprise tabular data differs from public benchmarks in ways that prevent good generalization of models like TabPFN, TabICL, and ConTextTab between the two domains.
-
Simultaneous hyperkinetic movement disorders phenotyping: a cross-cohort pediatric transfer study using routine videos, markerless pose estimation and a tabular foundation model
A pose-estimation plus tabular foundation model pipeline trained on 25 adults transfers to 12 pediatric hyperkinetic movement disorder cases with lightweight final-layer calibration, raising Hamming accuracy from 0.804 to 0.839 and Jaccard index from 0.548 to 0.633 on held-out patients.
-
The Good, the Bad, and the Ugly of Markov Boundary for Tabular Prediction
Oracle Markov boundaries improve prediction on high-dimensional sparse tabular data but causal discovery pipelines rarely recover boundaries that beat using all features.
-
Modular Multimodal Classification Without Fine-Tuning: A Simple Compositional Approach
CoMET achieves strong multimodal classification performance by composing frozen modality encoders, PCA compression, and tabular foundation models without any training, reaching state-of-the-art on diverse benchmarks including large-scale hierarchical tasks.
-
Distilling Tabular Foundation Models for Structured Health Data
Leakage-aware distillation transfers at least 90% of tabular foundation model AUC to lightweight students across 19 health datasets, with 26x CPU speedup and preserved calibration/fairness.
-
TabH2O: A Unified Foundation Model for Tabular Prediction
TabH2O presents a unified tabular foundation model with dual-head architecture and single-stage pretraining that achieves an average rank of 2.55 on the TALENT benchmark, outperforming several established methods.
-
Tabular Foundation Model for Generative Modelling
TabFORGE generates high-quality synthetic tabular data by leveraging pretrained causality-aware representations in a two-stage diffusion-decoder architecture that mitigates latent distribution shifts.
-
TabCF: Distributional Control Function Estimation with Tabular Foundation Models
TabCF is a tuning-light method using tabular foundation models for control function regression to estimate distributional causal effects such as interventional means and quantiles.