Schema-1 is the first Data Language Model that natively understands raw tabular data and outperforms gradient-boosted ensembles, AutoML, and prior tabular foundation models on row-level prediction and imputation tasks.
Why tabular foundation models should be a research priority
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 9roles
other 1polarities
unclear 1representative citing papers
TFM-Retouche is an architecture-agnostic input-space residual adapter that improves tabular foundation model accuracy on 51 datasets by learning input corrections through the frozen backbone, with an identity guard to fall back to the original model.
Tabular clinical data guides contrastive learning on cardiac MR images to build better visual representations by identifying patient similarities, outperforming image-only augmentation on downstream disease prediction tasks.
LLM-TabLogic extracts inter-column logical constraints using LLMs and conditions a score-based latent diffusion model on them to generate synthetic tabular data that preserves those relationships.
SQuARE is a hybrid retrieval system that uses a complexity score to route tabular queries between chunk-based and SQL-based paths, outperforming single-strategy baselines and GPT-4o on precision and accuracy for complex spreadsheets.
TabGRAA applies group-relative advantage alignment in an iterative reward-guided post-training loop to improve tabular language model generators on fidelity, utility, and privacy trade-offs across five benchmarks.
TREASURE is a transformer model for payment transactions that boosts abnormal behavior detection performance by 111% over production systems and improves recommendation models by 104% when used as an embedding provider.
TabPFN maintains high ROC-AUC and structured attention under controlled additions of irrelevant features, nonlinear correlations, and mislabeled targets in binary classification.
CTGAN and LLMs generate synthetic student data that passes statistical and predictive utility checks for learning analytics.
citing papers explorer
-
Data Language Models: A New Foundation Model Class for Tabular Data
Schema-1 is the first Data Language Model that natively understands raw tabular data and outperforms gradient-boosted ensembles, AutoML, and prior tabular foundation models on row-level prediction and imputation tasks.
-
TFM-Retouche: A Lightweight Input-Space Adapter for Tabular Foundation Models
TFM-Retouche is an architecture-agnostic input-space residual adapter that improves tabular foundation model accuracy on 51 datasets by learning input corrections through the frozen backbone, with an identity guard to fall back to the original model.
-
Tables Guide Vision: Learning to See the Heart through Tabular Data
Tabular clinical data guides contrastive learning on cardiac MR images to build better visual representations by identifying patient similarities, outperforming image-only augmentation on downstream disease prediction tasks.
-
LLM-TabLogic: Preserving Inter-Column Logical Relationships in Synthetic Tabular Data via Prompt-Guided Latent Diffusion
LLM-TabLogic extracts inter-column logical constraints using LLMs and conditions a score-based latent diffusion model on them to generate synthetic tabular data that preserves those relationships.
-
SQuARE: Structured Query & Adaptive Retrieval Engine For Tabular Formats
SQuARE is a hybrid retrieval system that uses a complexity score to route tabular queries between chunk-based and SQL-based paths, outperforming single-strategy baselines and GPT-4o on precision and accuracy for complex spreadsheets.
-
Self-Improving Tabular Language Models via Iterative Reward-Guided Post-Training
TabGRAA applies group-relative advantage alignment in an iterative reward-guided post-training loop to improve tabular language model generators on fidelity, utility, and privacy trade-offs across five benchmarks.
-
TREASURE: The Visa Payment Foundation Model for High-Volume Transaction Understanding
TREASURE is a transformer model for payment transactions that boosts abnormal behavior detection performance by 111% over production systems and improves recommendation models by 104% when used as an embedding provider.
-
Noise Immunity in In-Context Tabular Learning: An Empirical Robustness Analysis of TabPFN's Attention Mechanisms
TabPFN maintains high ROC-AUC and structured attention under controlled additions of irrelevant features, nonlinear correlations, and mislabeled targets in binary classification.
-
Creating Artificial Students that Never Existed: Leveraging Large Language Models and CTGANs for Synthetic Data Generation
CTGAN and LLMs generate synthetic student data that passes statistical and predictive utility checks for learning analytics.