A new corpus of 108 mixed string-numeric tables shows that advanced tabular learners with basic string embeddings perform well on most real-world data, while large LLM encoders help on free-text heavy tables.
hub
Tabpfn-2.5: Advancing the state of the art in tabular foundation models
20 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
years
2026 20verdicts
UNVERDICTED 20roles
background 1polarities
background 1representative citing papers
MulTaBench is a new collection of 40 image-tabular and text-tabular datasets designed to test target-aware representation tuning in multimodal tabular models.
A prior-data fitted network amortizes causal sensitivity analysis by generating training labels via Lagrangian scalarization, achieving orders-of-magnitude faster bounds computation than per-instance methods.
Tabular foundation models show substantial depthwise redundancy, so a looped single-layer version achieves comparable results with 20% of the original parameters.
Schema-1 is the first Data Language Model that natively understands raw tabular data and outperforms gradient-boosted ensembles, AutoML, and prior tabular foundation models on row-level prediction and imputation tasks.
TFM-Retouche is an architecture-agnostic input-space residual adapter that improves tabular foundation model accuracy on 51 datasets by learning input corrections through the frozen backbone, with an identity guard to fall back to the original model.
TabDistill distills feature interactions from tabular foundation models via post-hoc attribution and inserts them into GAMs, yielding consistent predictive gains.
Foundation models outperform dataset-specific machine learning in energy time series forecasting across 54 datasets in 9 categories.
Tabular foundation models achieve high accuracy in molecular property prediction through in-context learning, with up to 100% win rates on MoleculeACE tasks when paired with CheMeleon embeddings.
Muon optimizer outperforms AdamW across 17 tabular datasets when training MLPs under a shared protocol.
Spline encodings for numerical features show task-dependent performance in tabular deep learning, with piecewise-linear encoding robust for classification and variable results for regression depending on spline family, knot strategy, and backbone.
VIP-COP is a black-box method that optimizes context for tabular foundation models by ranking and selecting high-value samples and features via online KernelSHAP regression, outperforming baselines on large high-dimensional data.
TabFORGE generates high-quality synthetic tabular data by leveraging pretrained causality-aware representations in a two-stage diffusion-decoder architecture that mitigates latent distribution shifts.
TabCF is a tuning-light method using tabular foundation models for control function regression to estimate distributional causal effects such as interventional means and quantiles.
Eywa enables language-based agentic AI systems to collaborate with specialized scientific foundation models for improved performance on structured data tasks.
Analog optical inference on 5.84 million mortgage records achieves 94.6% balanced accuracy, with gaps traced to encoding and architecture rather than hardware non-idealities.
PRAGMA pre-trains a Transformer on heterogeneous banking events with a tailored self-supervised masked objective, yielding embeddings that support strong downstream performance on credit scoring, fraud detection, and lifetime value prediction using linear heads or light fine-tuning.
ConceptTracer supplies an interactive interface and saliency/selectivity metrics to locate concept-responsive neurons in neural representations, shown on TabPFN.
TabPFNv2.5 delivers 40x faster inference than Random Forest at 97% binary accuracy on TON IoT data, enabling a hybrid pipeline for real-time IoT threat screening in smart cities.
TabPFN maintains high ROC-AUC and structured attention under controlled additions of irrelevant features, nonlinear correlations, and mislabeled targets in binary classification.
citing papers explorer
-
STRABLE: Benchmarking Tabular Machine Learning with Strings
A new corpus of 108 mixed string-numeric tables shows that advanced tabular learners with basic string embeddings perform well on most real-world data, while large LLM encoders help on free-text heavy tables.
-
MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image
MulTaBench is a new collection of 40 image-tabular and text-tabular datasets designed to test target-aware representation tuning in multimodal tabular models.
-
Is One Layer Enough? Understanding Inference Dynamics in Tabular Foundation Models
Tabular foundation models show substantial depthwise redundancy, so a looped single-layer version achieves comparable results with 20% of the original parameters.
-
TFM-Retouche: A Lightweight Input-Space Adapter for Tabular Foundation Models
TFM-Retouche is an architecture-agnostic input-space residual adapter that improves tabular foundation model accuracy on 51 datasets by learning input corrections through the frozen backbone, with an identity guard to fall back to the original model.
-
Selecting Feature Interactions for Generalized Additive Models by Distilling Foundation Models
TabDistill distills feature interactions from tabular foundation models via post-hoc attribution and inserts them into GAMs, yielding consistent predictive gains.
-
FETS Benchmark: Foundation Models Outperform Dataset-specific Machine Learning in Energy Time Series Forecasting
Foundation models outperform dataset-specific machine learning in energy time series forecasting across 54 datasets in 9 categories.
-
Tabular foundation models for in-context prediction of molecular properties
Tabular foundation models achieve high accuracy in molecular property prediction through in-context learning, with up to 100% win rates on MoleculeACE tasks when paired with CheMeleon embeddings.
-
Benchmarking Optimizers for MLPs in Tabular Deep Learning
Muon optimizer outperforms AdamW across 17 tabular datasets when training MLPs under a shared protocol.
-
From Uniform to Learned Knots: A Study of Spline-Based Numerical Encodings for Tabular Deep Learning
Spline encodings for numerical features show task-dependent performance in tabular deep learning, with piecewise-linear encoding robust for classification and variable results for regression depending on spline family, knot strategy, and backbone.
-
VIP-COP: Context Optimization for Tabular Foundation Models
VIP-COP is a black-box method that optimizes context for tabular foundation models by ranking and selecting high-value samples and features via online KernelSHAP regression, outperforming baselines on large high-dimensional data.
-
Tabular Foundation Model for Generative Modelling
TabFORGE generates high-quality synthetic tabular data by leveraging pretrained causality-aware representations in a two-stage diffusion-decoder architecture that mitigates latent distribution shifts.
-
Analog Optical Inference on Million-Record Mortgage Data
Analog optical inference on 5.84 million mortgage records achieves 94.6% balanced accuracy, with gaps traced to encoding and architecture rather than hardware non-idealities.
-
PRAGMA: Revolut Foundation Model
PRAGMA pre-trains a Transformer on heterogeneous banking events with a tailored self-supervised masked objective, yielding embeddings that support strong downstream performance on credit scoring, fraud detection, and lifetime value prediction using linear heads or light fine-tuning.
-
ConceptTracer: Interactive Analysis of Concept Saliency and Selectivity in Neural Representations
ConceptTracer supplies an interactive interface and saliency/selectivity metrics to locate concept-responsive neurons in neural representations, shown on TabPFN.
-
Noise Immunity in In-Context Tabular Learning: An Empirical Robustness Analysis of TabPFN's Attention Mechanisms
TabPFN maintains high ROC-AUC and structured attention under controlled additions of irrelevant features, nonlinear correlations, and mislabeled targets in binary classification.