A new corpus of 108 mixed string-numeric tables shows that advanced tabular learners with basic string embeddings perform well on most real-world data, while large LLM encoders help on free-text heavy tables.
hub
Tabiclv2: A better, faster, scalable, and open tabular foundation model
13 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
years
2026 13verdicts
UNVERDICTED 13roles
background 1polarities
background 1representative citing papers
MulTaBench is a new collection of 40 image-tabular and text-tabular datasets designed to test target-aware representation tuning in multimodal tabular models.
PFN-TS converts PFN posterior predictives into mean-reward samples for Thompson sampling using a subsampled predictive CLT, with consistency proofs, regret bounds, and strong empirical performance on synthetic and real bandit benchmarks.
Schema-1 is the first Data Language Model that natively understands raw tabular data and outperforms gradient-boosted ensembles, AutoML, and prior tabular foundation models on row-level prediction and imputation tasks.
TFM-Retouche is an architecture-agnostic input-space residual adapter that improves tabular foundation model accuracy on 51 datasets by learning input corrections through the frozen backbone, with an identity guard to fall back to the original model.
RamanBench unifies 74 datasets into the first large-scale reproducible benchmark for ML on Raman spectra, finding tabular foundation models outperform baselines but no method generalizes across datasets.
OSCBO adaptively balances Gaussian process sharpness and calibration in Bayesian optimization by casting hyperparameter selection as constrained online learning, while preserving sublinear regret bounds.
FICBO pretrains a feedback-aware transformer with a structured prior on feedback distortion to adaptively exploit or ignore unreliable auxiliary signals during in-context black-box optimization.
Tabular foundation models achieve high accuracy in molecular property prediction through in-context learning, with up to 100% win rates on MoleculeACE tasks when paired with CheMeleon embeddings.
Muon optimizer outperforms AdamW across 17 tabular datasets when training MLPs under a shared protocol.
KumoRFM-2 pre-trains on synthetic and real relational data across row, column, foreign-key and cross-sample axes, injects task information early, and achieves up to 8% gains over supervised baselines on 41 benchmarks in few-shot and fine-tuned regimes while handling billion-scale datasets.
TabFORGE generates high-quality synthetic tabular data by leveraging pretrained causality-aware representations in a two-stage diffusion-decoder architecture that mitigates latent distribution shifts.
TabCF is a tuning-light method using tabular foundation models for control function regression to estimate distributional causal effects such as interventional means and quantiles.
citing papers explorer
-
STRABLE: Benchmarking Tabular Machine Learning with Strings
A new corpus of 108 mixed string-numeric tables shows that advanced tabular learners with basic string embeddings perform well on most real-world data, while large LLM encoders help on free-text heavy tables.
-
MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image
MulTaBench is a new collection of 40 image-tabular and text-tabular datasets designed to test target-aware representation tuning in multimodal tabular models.
-
PFN-TS: Thompson Sampling for Contextual Bandits via Prior-Data Fitted Networks
PFN-TS converts PFN posterior predictives into mean-reward samples for Thompson sampling using a subsampled predictive CLT, with consistency proofs, regret bounds, and strong empirical performance on synthetic and real bandit benchmarks.
-
Data Language Models: A New Foundation Model Class for Tabular Data
Schema-1 is the first Data Language Model that natively understands raw tabular data and outperforms gradient-boosted ensembles, AutoML, and prior tabular foundation models on row-level prediction and imputation tasks.
-
TFM-Retouche: A Lightweight Input-Space Adapter for Tabular Foundation Models
TFM-Retouche is an architecture-agnostic input-space residual adapter that improves tabular foundation model accuracy on 51 datasets by learning input corrections through the frozen backbone, with an identity guard to fall back to the original model.
-
RamanBench: A Large-Scale Benchmark for Machine Learning on Raman Spectroscopy
RamanBench unifies 74 datasets into the first large-scale reproducible benchmark for ML on Raman spectra, finding tabular foundation models outperform baselines but no method generalizes across datasets.
-
Online Sharp-Calibrated Bayesian Optimization
OSCBO adaptively balances Gaussian process sharpness and calibration in Bayesian optimization by casting hyperparameter selection as constrained online learning, while preserving sublinear regret bounds.
-
In-Context Black-Box Optimization with Unreliable Feedback
FICBO pretrains a feedback-aware transformer with a structured prior on feedback distortion to adaptively exploit or ignore unreliable auxiliary signals during in-context black-box optimization.
-
Tabular foundation models for in-context prediction of molecular properties
Tabular foundation models achieve high accuracy in molecular property prediction through in-context learning, with up to 100% win rates on MoleculeACE tasks when paired with CheMeleon embeddings.
-
Benchmarking Optimizers for MLPs in Tabular Deep Learning
Muon optimizer outperforms AdamW across 17 tabular datasets when training MLPs under a shared protocol.
-
KumoRFM-2: Scaling Foundation Models for Relational Learning
KumoRFM-2 pre-trains on synthetic and real relational data across row, column, foreign-key and cross-sample axes, injects task information early, and achieves up to 8% gains over supervised baselines on 41 benchmarks in few-shot and fine-tuned regimes while handling billion-scale datasets.
-
Tabular Foundation Model for Generative Modelling
TabFORGE generates high-quality synthetic tabular data by leveraging pretrained causality-aware representations in a two-stage diffusion-decoder architecture that mitigates latent distribution shifts.
-
TabCF: Distributional Control Function Estimation with Tabular Foundation Models
TabCF is a tuning-light method using tabular foundation models for control function regression to estimate distributional causal effects such as interventional means and quantiles.