TabArena launches a dynamic, updatable benchmarking system for tabular ML that shows boosted trees remain competitive, deep learning matches them under larger budgets with ensembling, foundation models excel on small data, and cross-model ensembles advance SOTA while flagging validation overfitting.
hub Baseline reference
TabDPT: Scaling tabular foundation models on real data.arXiv preprint arXiv:2410.18164
Baseline reference. 67% of citing Pith papers use this work as a benchmark or comparison.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Tabular foundation models excel on tiny- to medium-sized IID data but are outperformed by traditional tree-based and deep learning models on non-IID, large, and high-dimensional datasets, based on evaluations across 11 models and 142 datasets in the new BeyondArena benchmark.
TabQL is a reinforcement learning framework that substitutes a tabular foundation model with in-context capabilities for the parametric Q-network in DQN, with a warm-up phase and theoretical analysis claiming improved sample efficiency.
SurvivalPFN amortizes Bayesian survival analysis for right-censored data by pretraining a prior-data fitted network on synthetic identifiable DGPs and then performing in-context inference, achieving competitive results on 61 real datasets.
MulTaBench is a new collection of 40 image-tabular and text-tabular datasets designed to test target-aware representation tuning in multimodal tabular models.
TFM-Retouche is an architecture-agnostic input-space residual adapter that improves tabular foundation model accuracy on 51 datasets by learning input corrections through the frozen backbone, with an identity guard to fall back to the original model.
RamanBench unifies 74 datasets into the first large-scale reproducible benchmark for ML on Raman spectra, finding tabular foundation models outperform baselines but no method generalizes across datasets.
O'Prior, a compositional synthetic prior with hierarchical SCMs, realism engines, stress modules, and curriculum protocols, improves tabular foundation model accuracy and robustness on real benchmarks when architecture and compute are held fixed.
Distilling TabICLv2 into XGBoost via stratified OOF labeling yields 0.882 macro-mean AUC (96.5% of teacher) at 1.9 ms CPU across 153 datasets, with significant gains over tuned CatBoost on low-dimensional data.
TabPFN-3 scales tabular foundation models to 1M rows with synthetic pretraining, test-time compute, and benchmark-leading performance on tabular, relational, and tabular-text tasks while being up to 20x faster than TabPFN-2.5.
DiffICL breaks the quality-privacy tradeoff in small-data tabular synthesis by using in-context learning on pretrained structural priors to generate data that is both higher quality and less memorizing of training samples.
FEAT is a linear-complexity structured data foundation model using dual-axis encoding, AFBM state-space models, and Conv-GLA to achieve O(N) scaling and permutation invariance while outperforming prior SFMs on real-world benchmarks.
TabPFN-2.5 scales tabular foundation models to 20x larger datasets, outperforms tuned tree models on TabArena, achieves near-perfect win rates against default XGBoost, and adds a distillation engine for fast production deployment.
MachineLearningLM uses continued pretraining on SCM-synthesized ML tasks with random-forest distillation to give LLMs robust many-shot in-context learning on tabular classification, reaching random-forest accuracy levels while preserving general chat performance.
Adapting tabular foundation models with an MTLR survival head produces competitive or superior C-index scores on MIMIC-IV (0.856) and eICU (0.797) compared to DeepSurv and zero-shot baselines.
CoMET achieves strong multimodal classification performance by composing frozen modality encoders, PCA compression, and tabular foundation models without any training, reaching state-of-the-art on diverse benchmarks including large-scale hierarchical tasks.
Leakage-aware distillation transfers at least 90% of tabular foundation model AUC to lightweight students across 19 health datasets, with 26x CPU speedup and preserved calibration/fairness.
Six modern tabular foundation models are near-redundant, limiting ensemble gains to +0.18% accuracy at high cost while some methods degrade calibration.
VIP-COP is a black-box method that optimizes context for tabular foundation models by ranking and selecting high-value samples and features via online KernelSHAP regression, outperforming baselines on large high-dimensional data.
MaskTab is a masked pretraining method for industrial tabular data that delivers measurable gains in classification AUC and KS metrics while enabling effective distillation to smaller models.
TabPFN maintains high ROC-AUC and structured attention under controlled additions of irrelevant features, nonlinear correlations, and mislabeled targets in binary classification.
citing papers explorer
-
SurvivalPFN: Amortizing Survival Prediction via In-Context Bayesian Inference
SurvivalPFN amortizes Bayesian survival analysis for right-censored data by pretraining a prior-data fitted network on synthetic identifiable DGPs and then performing in-context inference, achieving competitive results on 61 real datasets.