Zero-Run auditing supplies valid lower bounds on differential privacy parameters from fixed member and non-member datasets by modeling and correcting distribution-shift confounding via causal-inference techniques.
hub Canonical reference
TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second
Canonical reference. 86% of citing Pith papers cite this work as background.
abstract
We present TabPFN, a trained Transformer that can do supervised classification for small tabular datasets in less than a second, needs no hyperparameter tuning and is competitive with state-of-the-art classification methods. TabPFN performs in-context learning (ICL), it learns to make predictions using sequences of labeled examples (x, f(x)) given in the input, without requiring further parameter updates. TabPFN is fully entailed in the weights of our network, which accepts training and test samples as a set-valued input and yields predictions for the entire test set in a single forward pass. TabPFN is a Prior-Data Fitted Network (PFN) and is trained offline once, to approximate Bayesian inference on synthetic datasets drawn from our prior. This prior incorporates ideas from causal reasoning: It entails a large space of structural causal models with a preference for simple structures. On the 18 datasets in the OpenML-CC18 suite that contain up to 1 000 training data points, up to 100 purely numerical features without missing values, and up to 10 classes, we show that our method clearly outperforms boosted trees and performs on par with complex state-of-the-art AutoML systems with up to 230$\times$ speedup. This increases to a 5 700$\times$ speedup when using a GPU. We also validate these results on an additional 67 small numerical datasets from OpenML. We provide all our code, the trained TabPFN, an interactive browser demo and a Colab notebook at https://github.com/automl/TabPFN.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Transformers performing in-context learning implicitly implement gradient descent, ridge regression, and least-squares predictors for linear models, with behavior shifting based on model depth, width, and data noise.
Face-Feature Tuning is a label-free logit remapping method that reduces FPR/TPR gaps across groups in deepfake detection while preserving overall accuracy.
TabQL is a reinforcement learning framework that substitutes a tabular foundation model with in-context capabilities for the parametric Q-network in DQN, with a warm-up phase and theoretical analysis claiming improved sample efficiency.
SCAgent automates side-channel leakage discovery via LLM agents for target identification and few-shot foundation models for scalable analysis on iOS.
SurvivalPFN amortizes Bayesian survival analysis for right-censored data by pretraining a prior-data fitted network on synthetic identifiable DGPs and then performing in-context inference, achieving competitive results on 61 real datasets.
FORGE reformulates molecular optimization as context-aware fragment ranking and replacement using mined low-to-high edit pairs, outperforming larger language models and graph methods on standard benchmarks.
Forecast loss differentials are reframed as returns and assessed with risk-adjusted finance metrics, showing professional forecasters are harder to beat on risk-adjusted performance than on raw accuracy in US macro forecasting.
Schema-1 is the first Data Language Model that natively understands raw tabular data and outperforms gradient-boosted ensembles, AutoML, and prior tabular foundation models on row-level prediction and imputation tasks.
TFM-Retouche is an architecture-agnostic input-space residual adapter that improves tabular foundation model accuracy on 51 datasets by learning input corrections through the frozen backbone, with an identity guard to fall back to the original model.
PHBench shows Product Hunt launch signals predict Series A funding with an ensemble model reaching AP 0.037 and F0.5 0.097 on blind test data, outperforming logistic regression and zero-shot LLMs.
Time series foundation models match the performance of specialized models for day-ahead load forecasting while providing explanations that match domain knowledge on weather and calendar effects.
TabDistill distills feature interactions from tabular foundation models via post-hoc attribution and inserts them into GAMs, yielding consistent predictive gains.
The authors release the first Slovene ESG sentiment dataset from news and report that large language models lead on environmental and social classification while fine-tuned SloBERTa performs best on governance.
Reasoning LLMs with minimal tools for tree construction and analysis induce decision trees that outperform CART, compete with ensembles on low-resource tabular data, and provide human-readable reasoning traces.
FLUXtrapolation is a benchmark for domain generalization in ecosystem flux upscaling using temporal, spatial, and temperature-based extrapolation scenarios, with pilot results showing model separation on tail and multi-scale metrics.
LGB+ improves macroeconomic forecasts by letting linear basis functions compete with or alternate against tree updates inside gradient boosting, yielding native linear/nonlinear decomposition of predictions.
CarCrashNet supplies a large multi-modal crash simulation benchmark and CrashSolver neural model for data-driven full-vehicle crash prediction, validated against experiments and commercial solvers.
ModelLens learns a performance-aware latent space from 1.62M leaderboard records to rank unseen models on unseen datasets without forward passes on the target.
Decoupled PFNs use controllable synthetic priors to train separate latent-signal and noise heads, making epistemic-aleatoric decomposition identifiable and improving acquisition in noisy settings.
Tabular foundation models achieve high accuracy in molecular property prediction through in-context learning, with up to 100% win rates on MoleculeACE tasks when paired with CheMeleon embeddings.
ReSS extracts decision paths from trees as scaffolds to guide LLM reasoning generation, fine-tunes the LLM on the resulting dataset with scaffold-invariant augmentation, and reports up to 10% gains on medical and financial tabular benchmarks with new faithfulness metrics.
Spline encodings for numerical features show task-dependent performance in tabular deep learning, with piecewise-linear encoding robust for classification and variable results for regression depending on spline family, knot strategy, and backbone.
TabPFN-2.5 scales tabular foundation models to 20x larger datasets, outperforms tuned tree models on TabArena, achieves near-perfect win rates against default XGBoost, and adds a distillation engine for fast production deployment.
citing papers explorer
-
Privacy Auditing with Zero (0) Training Run
Zero-Run auditing supplies valid lower bounds on differential privacy parameters from fixed member and non-member datasets by modeling and correcting distribution-shift confounding via causal-inference techniques.
-
What learning algorithm is in-context learning? Investigations with linear models
Transformers performing in-context learning implicitly implement gradient descent, ridge regression, and least-squares predictors for linear models, with behavior shifting based on model depth, width, and data noise.
-
Toward Calibrated, Fair, and accurate Deepfake Detection
Face-Feature Tuning is a label-free logit remapping method that reduces FPR/TPR gaps across groups in deepfake detection while preserving overall accuracy.
-
TabQL: In-Context Q-Learning with Tabular Foundation Models
TabQL is a reinforcement learning framework that substitutes a tabular foundation model with in-context capabilities for the parametric Q-network in DQN, with a warm-up phase and theoretical analysis claiming improved sample efficiency.
-
Rethinking Side-Channel Analysis: Automated Discovery and Analysis of Side-Channel Leakage with LLM-Assisted Agents
SCAgent automates side-channel leakage discovery via LLM agents for target identification and few-shot foundation models for scalable analysis on iOS.
-
SurvivalPFN: Amortizing Survival Prediction via In-Context Bayesian Inference
SurvivalPFN amortizes Bayesian survival analysis for right-censored data by pretraining a prior-data fitted network on synthetic identifiable DGPs and then performing in-context inference, achieving competitive results on 61 real datasets.
-
FORGE: Fragment-Oriented Ranking and Generation for Context-Aware Molecular Optimization
FORGE reformulates molecular optimization as context-aware fragment ranking and replacement using mined low-to-high edit pairs, outperforming larger language models and graph methods on standard benchmarks.
-
Quantifying the Risk-Return Tradeoff in Forecasting
Forecast loss differentials are reframed as returns and assessed with risk-adjusted finance metrics, showing professional forecasters are harder to beat on risk-adjusted performance than on raw accuracy in US macro forecasting.
-
Data Language Models: A New Foundation Model Class for Tabular Data
Schema-1 is the first Data Language Model that natively understands raw tabular data and outperforms gradient-boosted ensembles, AutoML, and prior tabular foundation models on row-level prediction and imputation tasks.
-
TFM-Retouche: A Lightweight Input-Space Adapter for Tabular Foundation Models
TFM-Retouche is an architecture-agnostic input-space residual adapter that improves tabular foundation model accuracy on 51 datasets by learning input corrections through the frozen backbone, with an identity guard to fall back to the original model.
-
PHBench: A Benchmark for Predicting Startup Series A Funding from Product Hunt Launch Signals
PHBench shows Product Hunt launch signals predict Series A funding with an ensemble model reaching AP 0.037 and F0.5 0.097 on blind test data, outperforming logistic regression and zero-shot LLMs.
-
Explainable Load Forecasting with Covariate-Informed Time Series Foundation Models
Time series foundation models match the performance of specialized models for day-ahead load forecasting while providing explanations that match domain knowledge on weather and calendar effects.
-
Selecting Feature Interactions for Generalized Additive Models by Distilling Foundation Models
TabDistill distills feature interactions from tabular foundation models via post-hoc attribution and inserts them into GAMs, yielding consistent predictive gains.
-
Environmental, Social and Governance Sentiment Analysis on Slovene News: A Novel Dataset and Models
The authors release the first Slovene ESG sentiment dataset from news and report that large language models lead on environmental and social classification while fine-tuned SloBERTa performs best on governance.
-
Talking Trees: Reasoning-Assisted Induction of Decision Trees for Tabular Data
Reasoning LLMs with minimal tools for tree construction and analysis induce decision trees that outperform CART, compete with ensembles on low-resource tabular data, and provide human-readable reasoning traces.
-
FLUXtrapolation: A benchmark on extrapolating ecosystem fluxes
FLUXtrapolation is a benchmark for domain generalization in ecosystem flux upscaling using temporal, spatial, and temperature-based extrapolation scenarios, with pilot results showing model separation on tail and multi-scale metrics.
-
LGB+: A Macroeconomic Forecasting Road Test
LGB+ improves macroeconomic forecasts by letting linear basis functions compete with or alternate against tree updates inside gradient boosting, yielding native linear/nonlinear decomposition of predictions.
-
CarCrashNet: A Large-Scale Dataset and Hierarchical Neural Solver for Data-Driven Structural Crash Simulation
CarCrashNet supplies a large multi-modal crash simulation benchmark and CrashSolver neural model for data-driven full-vehicle crash prediction, validated against experiments and commercial solvers.
-
ModelLens: Finding the Best for Your Task from Myriads of Models
ModelLens learns a performance-aware latent space from 1.62M leaderboard records to rank unseen models on unseen datasets without forward passes on the target.
-
Decoupled PFNs: Identifiable Epistemic-Aleatoric Decomposition via Structured Synthetic Priors
Decoupled PFNs use controllable synthetic priors to train separate latent-signal and noise heads, making epistemic-aleatoric decomposition identifiable and improving acquisition in noisy settings.
-
Tabular foundation models for in-context prediction of molecular properties
Tabular foundation models achieve high accuracy in molecular property prediction through in-context learning, with up to 100% win rates on MoleculeACE tasks when paired with CheMeleon embeddings.
-
ReSS: Learning Reasoning Models for Tabular Data Prediction via Symbolic Scaffold
ReSS extracts decision paths from trees as scaffolds to guide LLM reasoning generation, fine-tunes the LLM on the resulting dataset with scaffold-invariant augmentation, and reports up to 10% gains on medical and financial tabular benchmarks with new faithfulness metrics.
-
From Uniform to Learned Knots: A Study of Spline-Based Numerical Encodings for Tabular Deep Learning
Spline encodings for numerical features show task-dependent performance in tabular deep learning, with piecewise-linear encoding robust for classification and variable results for regression depending on spline family, knot strategy, and backbone.
-
TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models
TabPFN-2.5 scales tabular foundation models to 20x larger datasets, outperforms tuned tree models on TabArena, achieves near-perfect win rates against default XGBoost, and adds a distillation engine for fast production deployment.
-
LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers
LLM-FE is a framework that treats feature engineering as LLM-driven program search with data feedback, reporting consistent gains over baselines on classification and regression tabular tasks.
-
TabICL: A Tabular Foundation Model for In-Context Learning on Large Data
TabICL scales in-context learning to large tabular data via column-then-row attention for row embeddings followed by a transformer, matching TabPFNv2 speed and performance while outperforming it and CatBoost on datasets over 10K samples.
-
Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive
DPOP is a new loss function that prevents DPO from lowering preferred response likelihoods and outperforms standard DPO on diverse datasets, MT-Bench, and enables Smaug-72B to exceed 80% on the Open LLM Leaderboard.
-
Tabular Foundation Models for Clinical Survival Analysis via Survival-Aware Adaptation
Adapting tabular foundation models with an MTLR survival head produces competitive or superior C-index scores on MIMIC-IV (0.856) and eICU (0.797) compared to DeepSurv and zero-shot baselines.
-
QDSP: An Interpretable Structured Learning Framework for Predicting Death or Cerebral Palsy in Very Low Birth Weight Infants
QDSP combines QSS and DSP modules to reach 0.92 accuracy and 0.97 AUC on a 51-infant VLBWI cohort for death or cerebral palsy prediction, outperforming XGBoost, TabNet, and TabPFN while identifying clinically relevant factors.
-
Modular Multimodal Classification Without Fine-Tuning: A Simple Compositional Approach
CoMET achieves strong multimodal classification performance by composing frozen modality encoders, PCA compression, and tabular foundation models without any training, reaching state-of-the-art on diverse benchmarks including large-scale hierarchical tasks.
-
TabH2O: A Unified Foundation Model for Tabular Prediction
TabH2O presents a unified tabular foundation model with dual-head architecture and single-stage pretraining that achieves an average rank of 2.55 on the TALENT benchmark, outperforming several established methods.
-
Unlock the Potential of Large Language Models for Predictive Tabular Tasks in Data Science with Table-Specific Pretraining
Table-specific pretraining of Llama-2 yields significant gains on zero-shot, few-shot, and in-context tabular prediction tasks over prior benchmarks.
-
Customer Churn Prediction on Structured Data Using FT-Transformer and Stacking Ensembles
A stacking ensemble of FT-Transformer and XGBoost achieves superior F1 and AUC scores on a bank churn dataset compared to an MLP baseline under cross-validation.
-
Tabular foundation models for robust calibration of near-infrared chemical sensing data
Preprocessing-optimized TabPFN achieves top average rank in regression on 66 NIR datasets and remains competitive on outliers and extrapolation compared to PLS, Ridge, CatBoost, and CNN-1D.
-
Optimizing IoT Intrusion Detection with Tabular Foundation Models for Smart City Forensics
TabPFNv2.5 delivers 40x faster inference than Random Forest at 97% binary accuracy on TON IoT data, enabling a hybrid pipeline for real-time IoT threat screening in smart cities.
-
Noise Immunity in In-Context Tabular Learning: An Empirical Robustness Analysis of TabPFN's Attention Mechanisms
TabPFN maintains high ROC-AUC and structured attention under controlled additions of irrelevant features, nonlinear correlations, and mislabeled targets in binary classification.
-
Comparative Evaluation of Machine Learning Models for Predicting Donor Kidney Discard
On 4080 German deceased donors, an ensemble ML model reached MCC 0.76 for kidney discard prediction, with standardized preprocessing and feature selection proving more important than the specific algorithm chosen.
-
Tabular Data with Class Imbalance: Predicting Electric Vehicle Crash Severity with Pretrained Transformers (TabPFN) and Mamba-Based Models
Benchmarks TabPFN, MambaNet and MambaAttention on imbalanced EV crash severity classification with SMOTEENN resampling on Texas data, identifying intersection relation and speed limit as top features and MambaAttention as strongest on severe cases.
-
Correcting Class Imbalance in Prior-Data Fitted Networks for Tabular Classification
Thresholding and downsampling effectively mitigate class imbalance in PFNs for tabular classification due to their calibration and limited-data strengths.
-
Evaluating TabPFN for Mild Cognitive Impairment to Alzheimer's Disease Conversion in Data Limited Settings
TabPFN reaches AUC 0.892 for 3-year MCI-to-AD conversion on TADPOLE data and holds performance at N=50 training samples where XGBoost, Random Forest, LightGBM, and logistic regression drop.
- When Tabular Foundation Models Meet Strategic Tabular Data: A Prior Alignment Approach
- TabPFN-3: Technical Report
- Reciprocal Co-Training (RCT): Coupling Gradient-Based and Non-Differentiable Models via Reinforcement Learning
- An Empirical Study of Machine Learning Robustness and Scalability for Imbalanced Tabular Clinical Data in Emergency and Critical Care