hub Baseline reference

TabDPT: Scaling tabular foundation models on real data.arXiv preprint arXiv:2410.18164

Junwei Ma, Valentin Thomas, Rasa Hosseinzadeh, Hamidreza Kamkari, Alex Labach, Jesse C Cresswell, Keyvan Golestan, Guangwei Yu, Maksims V olkovs, Anthony L Caterini · 2024 · arXiv 2410.18164

Baseline reference. 67% of citing Pith papers use this work as a benchmark or comparison.

21 Pith papers citing it

Baseline 67% of classified citations

read on arXiv browse 21 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

baseline 4 background 1 method 1

citation-polarity summary

baseline 4 background 1 use method 1

representative citing papers

TabArena: A Living Benchmark for Machine Learning on Tabular Data

cs.LG · 2025-06-20 · conditional · novelty 8.0

TabArena launches a dynamic, updatable benchmarking system for tabular ML that shows boosted trees remain competitive, deep learning matches them under larger budgets with ensembling, foundation models excel on small data, and cross-model ensembles advance SOTA while flagging validation overfitting.

Beyond IID: How General Are Tabular Foundation Models, Really?

cs.LG · 2026-06-29 · unverdicted · novelty 7.0

Tabular foundation models excel on tiny- to medium-sized IID data but are outperformed by traditional tree-based and deep learning models on non-IID, large, and high-dimensional datasets, based on evaluations across 11 models and 142 datasets in the new BeyondArena benchmark.

TabQL: In-Context Q-Learning with Tabular Foundation Models

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

TabQL is a reinforcement learning framework that substitutes a tabular foundation model with in-context capabilities for the parametric Q-network in DQN, with a warm-up phase and theoretical analysis claiming improved sample efficiency.

SurvivalPFN: Amortizing Survival Prediction via In-Context Bayesian Inference

cs.LG · 2026-05-15 · unverdicted · novelty 7.0

SurvivalPFN amortizes Bayesian survival analysis for right-censored data by pretraining a prior-data fitted network on synthetic identifiable DGPs and then performing in-context inference, achieving competitive results on 61 real datasets.

MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

MulTaBench is a new collection of 40 image-tabular and text-tabular datasets designed to test target-aware representation tuning in multimodal tabular models.

TFM-Retouche: A Lightweight Input-Space Adapter for Tabular Foundation Models

cs.LG · 2026-05-07 · unverdicted · novelty 7.0 · 2 refs

TFM-Retouche is an architecture-agnostic input-space residual adapter that improves tabular foundation model accuracy on 51 datasets by learning input corrections through the frozen backbone, with an identity guard to fall back to the original model.

RamanBench: A Large-Scale Benchmark for Machine Learning on Raman Spectroscopy

cs.LG · 2026-05-03 · unverdicted · novelty 7.0

RamanBench unifies 74 datasets into the first large-scale reproducible benchmark for ML on Raman spectra, finding tabular foundation models outperform baselines but no method generalizes across datasets.

Shaping the Prior: How Synthetic Task Distributions Determine Tabular Foundation Model Quality

cs.LG · 2026-05-18 · unverdicted · novelty 6.0

O'Prior, a compositional synthetic prior with hierarchical SCMs, realism engines, stress modules, and curriculum protocols, improves tabular foundation model accuracy and robustness on real benchmarks when architecture and compute are held fixed.

Pocket Foundation Models: Distilling TFMs into CPU-Ready Gradient-Boosted Trees

cs.LG · 2026-05-18 · accept · novelty 6.0

Distilling TabICLv2 into XGBoost via stratified OOF labeling yields 0.882 macro-mean AUC (96.5% of teacher) at 1.9 ms CPU across 153 datasets, with significant gains over tuned CatBoost on low-dimensional data.

TabPFN-3: Technical Report

cs.LG · 2026-05-13 · unverdicted · novelty 6.0

TabPFN-3 scales tabular foundation models to 1M rows with synthetic pretraining, test-time compute, and benchmark-leading performance on tabular, relational, and tabular-text tasks while being up to 20x faster than TabPFN-2.5.

Breaking the Quality-Privacy Tradeoff in Tabular Data Generation via In-Context Learning

cs.LG · 2026-05-06 · unverdicted · novelty 6.0

DiffICL breaks the quality-privacy tradeoff in small-data tabular synthesis by using in-context learning on pretrained structural priors to generate data that is both higher quality and less memorizing of training samples.

FEAT: A Linear-Complexity Foundation Model for Extremely Large Structured Data

cs.LG · 2026-03-17 · unverdicted · novelty 6.0

FEAT is a linear-complexity structured data foundation model using dual-axis encoding, AFBM state-space models, and Conv-GLA to achieve O(N) scaling and permutation invariance while outperforming prior SFMs on real-world benchmarks.

TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models

cs.LG · 2025-11-11 · unverdicted · novelty 6.0

TabPFN-2.5 scales tabular foundation models to 20x larger datasets, outperforms tuned tree models on TabArena, achieves near-perfect win rates against default XGBoost, and adds a distillation engine for fast production deployment.

MachineLearningLM: Scaling Many-shot In-context Learning via Continued Pretraining

cs.CL · 2025-09-08 · unverdicted · novelty 6.0

MachineLearningLM uses continued pretraining on SCM-synthesized ML tasks with random-forest distillation to give LLMs robust many-shot in-context learning on tabular classification, reaching random-forest accuracy levels while preserving general chat performance.

Tabular Foundation Models for Clinical Survival Analysis via Survival-Aware Adaptation

cs.LG · 2026-06-10 · unverdicted · novelty 5.0

Adapting tabular foundation models with an MTLR survival head produces competitive or superior C-index scores on MIMIC-IV (0.856) and eICU (0.797) compared to DeepSurv and zero-shot baselines.

Modular Multimodal Classification Without Fine-Tuning: A Simple Compositional Approach

cs.LG · 2026-05-20 · unverdicted · novelty 5.0

CoMET achieves strong multimodal classification performance by composing frozen modality encoders, PCA compression, and tabular foundation models without any training, reaching state-of-the-art on diverse benchmarks including large-scale hierarchical tasks.

Distilling Tabular Foundation Models for Structured Health Data

cs.LG · 2026-05-18 · unverdicted · novelty 5.0

Leakage-aware distillation transfers at least 90% of tabular foundation model AUC to lightweight students across 19 health datasets, with 26x CPU speedup and preserved calibration/fairness.

Ensembling Tabular Foundation Models - A Diversity Ceiling And A Calibration Trap

cs.LG · 2026-05-18 · unverdicted · novelty 5.0

Six modern tabular foundation models are near-redundant, limiting ensemble gains to +0.18% accuracy at high cost while some methods degrade calibration.

VIP-COP: Context Optimization for Tabular Foundation Models

cs.LG · 2026-05-13 · unverdicted · novelty 5.0

VIP-COP is a black-box method that optimizes context for tabular foundation models by ranking and selecting high-value samples and features via online KernelSHAP regression, outperforming baselines on large high-dimensional data.

MaskTab: Scalable Masked Tabular Pretraining with Scaling Laws and Distillation for Industrial Classification

cs.LG · 2026-05-12 · unverdicted · novelty 5.0

MaskTab is a masked pretraining method for industrial tabular data that delivers measurable gains in classification AUC and KS metrics while enabling effective distillation to smaller models.

Noise Immunity in In-Context Tabular Learning: An Empirical Robustness Analysis of TabPFN's Attention Mechanisms

cs.LG · 2026-04-06 · unverdicted · novelty 4.0

TabPFN maintains high ROC-AUC and structured attention under controlled additions of irrelevant features, nonlinear correlations, and mislabeled targets in binary classification.

citing papers explorer

Showing 1 of 1 citing paper after filters.

SurvivalPFN: Amortizing Survival Prediction via In-Context Bayesian Inference cs.LG · 2026-05-15 · unverdicted · none · ref 61
SurvivalPFN amortizes Bayesian survival analysis for right-censored data by pretraining a prior-data fitted network on synthetic identifiable DGPs and then performing in-context inference, achieving competitive results on 61 real datasets.

TabDPT: Scaling tabular foundation models on real data.arXiv preprint arXiv:2410.18164

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer