{"total":25,"items":[{"citing_arxiv_id":"2606.31208","ref_index":170,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Probing Memorization of Tabular In-Context Learning","primary_cat":"cs.LG","submitted_at":"2026-06-30T06:40:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A new probing framework detects moderate parametric memorization signals in tabular in-context learning models under single-task fine-tuning, strongest on low-cardinality tasks, but signals largely disappear under realistic training.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.30410","ref_index":194,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Beyond IID: How General Are Tabular Foundation Models, Really?","primary_cat":"cs.LG","submitted_at":"2026-06-29T14:55:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Tabular foundation models excel on tiny- to medium-sized IID data but are outperformed by traditional tree-based and deep learning models on non-IID, large, and high-dimensional datasets, based on evaluations across 11 models and 142 datasets in the new BeyondArena benchmark.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.12006","ref_index":14,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Tabular Foundation Models for Clinical Survival Analysis via Survival-Aware Adaptation","primary_cat":"cs.LG","submitted_at":"2026-06-10T12:28:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Adapting tabular foundation models with an MTLR survival head produces competitive or superior C-index scores on MIMIC-IV (0.856) and eICU (0.797) compared to DeepSurv and zero-shot baselines.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.02384","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"TabPrep: Closing the Feature Engineering Gap in Tabular Benchmarks","primary_cat":"cs.LG","submitted_at":"2026-06-01T15:33:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"TabPrep is a new feature engineering pipeline that targets three data patterns and improves performance of tree-based, neural, linear, and foundation models on tabular benchmarks, often more than model architecture changes.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.01990","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Testing for Single-Population Ancestry in the Admixture Model","primary_cat":"stat.ME","submitted_at":"2026-06-01T09:47:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A new constrained parametric bootstrap test for single-population ancestry in the supervised admixture model, proven to have asymptotic level alpha and consistency.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20674","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Modular Multimodal Classification Without Fine-Tuning: A Simple Compositional Approach","primary_cat":"cs.LG","submitted_at":"2026-05-20T03:43:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"CoMET achieves strong multimodal classification performance by composing frozen modality encoders, PCA compression, and tabular foundation models without any training, reaching state-of-the-art on diverse benchmarks including large-scale hierarchical tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18979","ref_index":26,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"TabQL: In-Context Q-Learning with Tabular Foundation Models","primary_cat":"cs.LG","submitted_at":"2026-05-18T18:03:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"TabQL is a reinforcement learning framework that substitutes a tabular foundation model with in-context capabilities for the parametric Q-network in DQN, with a warm-up phase and theoretical analysis claiming improved sample efficiency.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18971","ref_index":26,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Shaping the Prior: How Synthetic Task Distributions Determine Tabular Foundation Model Quality","primary_cat":"cs.LG","submitted_at":"2026-05-18T18:00:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"O'Prior, a compositional synthetic prior with hierarchical SCMs, realism engines, stress modules, and curriculum protocols, improves tabular foundation model accuracy and robustness on real benchmarks when architecture and compute are held fixed.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18702","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Distilling Tabular Foundation Models for Structured Health Data","primary_cat":"cs.LG","submitted_at":"2026-05-18T17:37:28+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Leakage-aware distillation transfers at least 90% of tabular foundation model AUC to lightweight students across 19 health datasets, with 26x CPU speedup and preserved calibration/fairness.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18696","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Ensembling Tabular Foundation Models - A Diversity Ceiling And A Calibration Trap","primary_cat":"cs.LG","submitted_at":"2026-05-18T17:32:57+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Six modern tabular foundation models are near-redundant, limiting ensemble gains to +0.18% accuracy at high cost while some methods degrade calibration.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18654","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Pocket Foundation Models: Distilling TFMs into CPU-Ready Gradient-Boosted Trees","primary_cat":"cs.LG","submitted_at":"2026-05-18T17:00:20+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Distilling TabICLv2 into XGBoost via stratified OOF labeling yields 0.882 macro-mean AUC (96.5% of teacher) at 1.9 ms CPU across 153 datasets, with significant gains over tuned CatBoost on low-dimensional data.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15488","ref_index":61,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SurvivalPFN: Amortizing Survival Prediction via In-Context Bayesian Inference","primary_cat":"cs.LG","submitted_at":"2026-05-15T00:13:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SurvivalPFN amortizes Bayesian survival analysis for right-censored data by pretraining a prior-data fitted network on synthetic identifiable DGPs and then performing in-context inference, achieving competitive results on 61 real datasets.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"Thus,eδ∗ = 1 asks the model for the event-time PPD, whose tail probability gives the PPSD, while eδ∗ = 0 asks for the posterior predictive censoring distribution (PPCD). This indicator is not an observed event label for the query point; rather, it specifies the prediction target. SurvivalPFN parameterizes qω using the PFN-style transformer architecture of TabDPT [ 61] and CausalPFN [2]; see Appendix D.2 for details. As shown in Figure 3, each context row (xi, ti, δi)θ ∈ Dtr θ is embedded as a context token, while each query token is formed from (x∗ θ,eδ∗). We use three query-indicator schedules during training: •Event-only:always sets eδ∗ = 1and trains the model directly for PPSD prediction; • Both:duplicates each query with eδ∗ ∈ {0,1} and trains both event- and censoring-time prediction;"},{"citing_arxiv_id":"2605.14764","ref_index":10,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Compositional Sparsity as an Inductive Bias for Neural Architecture Design","primary_cat":"cs.LG","submitted_at":"2026-05-14T12:26:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"HNNs recover known sparse hierarchies on synthetic tasks and match or exceed dense DNNs on real datasets while using orders of magnitude fewer parameters and showing lower hyperparameter sensitivity.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.13986","ref_index":45,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"TabPFN-3: Technical Report","primary_cat":"cs.LG","submitted_at":"2026-05-13T18:01:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"TabPFN-3 scales tabular foundation models to 1M rows with synthetic pretraining, test-time compute, and benchmark-leading performance on tabular, relational, and tabular-text tasks while being up to 20x faster than TabPFN-2.5.","context_count":1,"top_context_role":"baseline","top_context_polarity":"baseline","context_text":"See Appendix E.2.4 for the full results. list of recent models, including tree-based models like CatBoost [38], LightGBM [39] or XGBoost [40], as well as newer deep-learning models like RealMLP [32], TabM [41], ModernNCA [42] or xRFM [43], the AutoML system AutoGluon [2], and other Tabular Foundation Models like TabICL [29, 30], TabDPT [44], TabSTAR [37], LimiX [45], Mitra [46] or TabPFN v2 [17]. The benchmark contains a set of 51 datasets selected from 1053 to be representative of real-world tabular data. See Erickson et al.[1] for the list of datasets and Section E.2.1 for definitions of TabArena's Elo and Improvability metrics. Pushing the performance frontier on TabArena.Figure 10 shows the performance ofTabPFN-3"},{"citing_arxiv_id":"2605.12904","ref_index":21,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"VIP-COP: Context Optimization for Tabular Foundation Models","primary_cat":"cs.LG","submitted_at":"2026-05-13T02:28:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"VIP-COP is a black-box method that optimizes context for tabular foundation models by ranking and selecting high-value samples and features via online KernelSHAP regression, outperforming baselines on large high-dimensional data.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11408","ref_index":36,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MaskTab: Scalable Masked Tabular Pretraining with Scaling Laws and Distillation for Industrial Classification","primary_cat":"cs.LG","submitted_at":"2026-05-12T01:56:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"MaskTab is a masked pretraining method for industrial tabular data that delivers measurable gains in classification AUC and KS metrics while enabling effective distillation to smaller models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.10616","ref_index":69,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image","primary_cat":"cs.LG","submitted_at":"2026-05-11T14:12:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"MulTaBench is a new collection of 40 image-tabular and text-tabular datasets designed to test target-aware representation tuning in multimodal tabular models.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"These findings suggest that designing novel architectures which contextualize the representations of unstructured modalities can push the boundaries of MMTL, and we believe that MulTaBench would be instrumental for developing true Multimodal TFMs. 2 Related Work Tabular Foundation Models.The landscape of tabular learning shifted with Prior-data Fitted Networks (PFNs) [69], which pretrain transformers over synthetic tabular datasets with in-context learning (ICL) [9]. The TabPFN family [40, 41, 34, 27] pioneered this direction. Multiple subsequent works [75, 76, 62, 103, 86, 102, 6] advanced the paradigm with improvements spanning synthetic data diversity, real-world data pretraining, and architectural scalability. Among these, ConTextTab"},{"citing_arxiv_id":"2605.06047","ref_index":20,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"TFM-Retouche: A Lightweight Input-Space Adapter for Tabular Foundation Models","primary_cat":"cs.LG","submitted_at":"2026-05-07T11:34:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"TFM-Retouche is an architecture-agnostic input-space residual adapter that improves tabular foundation model accuracy on 51 datasets by learning input corrections through the frozen backbone, with an identity guard to fall back to the original model.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.04911","ref_index":23,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Breaking the Quality-Privacy Tradeoff in Tabular Data Generation via In-Context Learning","primary_cat":"cs.LG","submitted_at":"2026-05-06T13:38:16+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DiffICL breaks the quality-privacy tradeoff in small-data tabular synthesis by using in-context learning on pretrained structural priors to generate data that is both higher quality and less memorizing of training samples.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.02003","ref_index":64,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"RamanBench: A Large-Scale Benchmark for Machine Learning on Raman Spectroscopy","primary_cat":"cs.LG","submitted_at":"2026-05-03T18:12:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"RamanBench unifies 74 datasets into the first large-scale reproducible benchmark for ML on Raman spectra, finding tabular foundation models outperform baselines but no method generalizes across datasets.","context_count":1,"top_context_role":"baseline","top_context_polarity":"baseline","context_text":"traditional ML models and (b) tree-based approaches as a reference, (c) Gradient Boosted Trees due to their tabular performance [ 59], (d) Deep Learning Models including the top-performing architectures from Lange et al. [21](ReZeroNet [60], FCResNeXt [61], and CoAtNet [62]), all recent (e) Tabular Foundation Model (TFM) (TabPFN [45, 46], TabICL [47, 48], MITRA [63], TabDPT [64] and TabM [65]), (f) Raman-specific architectures benchmarked in [26] (Deep CNN [39], SANet [40], RamanNet [41], RamanFormer [42], and RamanTransformer [66]), and (g) Time Series Classification (TSC) models (ROCKET [ 49] and Arsenal [ 50]). The two TSC models (classification-only) are providing the first direct comparison between time-series classifiers and Raman-specific architectures"},{"citing_arxiv_id":"2604.04868","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Noise Immunity in In-Context Tabular Learning: An Empirical Robustness Analysis of TabPFN's Attention Mechanisms","primary_cat":"cs.LG","submitted_at":"2026-04-06T17:16:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"TabPFN maintains high ROC-AUC and structured attention under controlled additions of irrelevant features, nonlinear correlations, and mislabeled targets in binary classification.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2603.16513","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"FEAT: A Linear-Complexity Foundation Model for Extremely Large Structured Data","primary_cat":"cs.LG","submitted_at":"2026-03-17T13:40:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"FEAT is a linear-complexity structured data foundation model using dual-axis encoding, AFBM state-space models, and Conv-GLA to achieve O(N) scaling and permutation invariance while outperforming prior SFMs on real-world benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.08667","ref_index":29,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models","primary_cat":"cs.LG","submitted_at":"2025-11-11T18:57:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"TabPFN-2.5 scales tabular foundation models to 20x larger datasets, outperforms tuned tree models on TabArena, achieves near-perfect win rates against default XGBoost, and adds a distillation engine for fast production deployment.","context_count":1,"top_context_role":"baseline","top_context_polarity":"baseline","context_text":"the NeurIPS 2025 Datasets & Benchmarks track and is thus most up-to-date. In particular, it compares a large class of recent models, including tree-based models like CatBoost [3], LightGBM [4] or XGBoost [2], as well as newer deep-learning models like RealMLP [22], TabM [24], ModernNCA [25] or xRFM [26], and other Tabular Foundation Models like TabICL [27], TabDPT [28], LimiX [29], Mitra [30] or TabPFNv2 [7]. We follow the paper's recommendation to benchmark on \"TabArena-Lite\", which is a cheaper but representative version of the full benchmark using only one test fold. The benchmark contains a set of 51 datasets selected from 1053 to be representative of real-world tabular data. See Erickson et al.[1] for the list of datasets."},{"citing_arxiv_id":"2509.06806","ref_index":18,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MachineLearningLM: Scaling Many-shot In-context Learning via Continued Pretraining","primary_cat":"cs.CL","submitted_at":"2025-09-08T15:38:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MachineLearningLM uses continued pretraining on SCM-synthesized ML tasks with random-forest distillation to give LLMs robust many-shot in-context learning on tabular classification, reaching random-forest accuracy levels while preserving general chat performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2506.16791","ref_index":23,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"TabArena: A Living Benchmark for Machine Learning on Tabular Data","primary_cat":"cs.LG","submitted_at":"2025-06-20T07:14:48+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"TabArena launches a dynamic, updatable benchmarking system for tabular ML that shows boosted trees remain competitive, deep learning matches them under larger budgets with ensembling, foundation models excel on small data, and cross-model ensembles advance SOTA while flagging validation overfitting.","context_count":1,"top_context_role":"baseline","top_context_polarity":"baseline","context_text":"XGBoost [14] XGBoost Prior Work + Us LightGBM [15] LightGBM Prior Work + Us CatBoost [16] CatBoost Prior Work + Us Explainable Boosting Machine [17, 18] EBM Authors FastAI MLP [19] FastaiMLP Authors Torch MLP [19] TorchMLP Authors RealMLP [20] RealMLP Authors TabM† mini [9] TabM Authors ModernNCA [21] ModernNCA Authors TabPFNv2 [5] TabPFNv2 Authors TabICL [22] TabICL - TabDPT [23] TabDPT - Linear / Logistic Regression Linear Prior Work + Us K-Nearest Neighbors KNN Prior Work + Us Implementation Framework.For implementing models, we rely on functionalities from Au- toGluon [19], an established machine learning framework used in practical applications. Each model is implemented within the standardized AbstractModel framework, which aligns with the"}],"limit":50,"offset":0}