A new corpus of 108 mixed string-numeric tables shows that advanced tabular learners with basic string embeddings perform well on most real-world data, while large LLM encoders help on free-text heavy tables.
hub Mixed citations
Accurate predictions on small data with a tabular foundation model.Nature, 637(8045):319–326
Mixed citation behavior. Most common role is background (50%).
hub tools
citation-role summary
citation-polarity summary
years
2026 14verdicts
UNVERDICTED 14representative citing papers
SurvivalPFN amortizes Bayesian survival analysis for right-censored data by pretraining a prior-data fitted network on synthetic identifiable DGPs and then performing in-context inference, achieving competitive results on 61 real datasets.
IV-ICL learns the marginal posterior of causal effects via in-context learning to derive bounds as quantiles, recovering the identified set more reliably than variational inference while running 20-500x faster.
PFN-TS converts PFN posterior predictives into mean-reward samples for Thompson sampling using a subsampled predictive CLT, with consistency proofs, regret bounds, and strong empirical performance on synthetic and real bandit benchmarks.
PIQL integrates privileged information to accelerate convergence, lower loss, and improve generalization in tabular foundation models.
Agentic-imodels evolves scikit-learn regressors via an autoresearch loop to jointly boost predictive performance and LLM-simulatability, improving downstream agentic data science tasks by up to 73% on the BLADE benchmark.
O'Prior, a compositional synthetic prior with hierarchical SCMs, realism engines, stress modules, and curriculum protocols, improves tabular foundation model accuracy and robustness on real benchmarks when architecture and compute are held fixed.
TipPFN uses prior-data fitted networks and in-context learning on synthetic bifurcation data to detect proximity to critical transitions in unseen dynamical systems and real observations.
DiffICL breaks the quality-privacy tradeoff in small-data tabular synthesis by using in-context learning on pretrained structural priors to generate data that is both higher quality and less memorizing of training samples.
The work introduces uncertainty-aware foundation models for clinical data by learning set-valued patient representations that enforce consistency across partial observations and integrate multimodal self-supervised objectives.
TabH2O presents a unified tabular foundation model with dual-head architecture and single-stage pretraining that achieves an average rank of 2.55 on the TALENT benchmark, outperforming several established methods.
VIP-COP is a black-box method that optimizes context for tabular foundation models by ranking and selecting high-value samples and features via online KernelSHAP regression, outperforming baselines on large high-dimensional data.
TabFORGE generates high-quality synthetic tabular data by leveraging pretrained causality-aware representations in a two-stage diffusion-decoder architecture that mitigates latent distribution shifts.
TabCF is a tuning-light method using tabular foundation models for control function regression to estimate distributional causal effects such as interventional means and quantiles.
citing papers explorer
-
PFN-TS: Thompson Sampling for Contextual Bandits via Prior-Data Fitted Networks
PFN-TS converts PFN posterior predictives into mean-reward samples for Thompson sampling using a subsampled predictive CLT, with consistency proofs, regret bounds, and strong empirical performance on synthetic and real bandit benchmarks.
-
Tabular Foundation Model for Generative Modelling
TabFORGE generates high-quality synthetic tabular data by leveraging pretrained causality-aware representations in a two-stage diffusion-decoder architecture that mitigates latent distribution shifts.