Schema-1 is the first Data Language Model that natively understands raw tabular data and outperforms gradient-boosted ensembles, AutoML, and prior tabular foundation models on row-level prediction and imputation tasks.
Tabular Data: Deep Learning is Not All You Need
6 Pith papers cite this work. Polarity classification is still indexing.
years
2026 6verdicts
UNVERDICTED 6representative citing papers
A new listwise learning-to-rank method uses smooth rank approximation and boosting to optimize without depending on a single metric.
Spline encodings for numerical features show task-dependent performance in tabular deep learning, with piecewise-linear encoding robust for classification and variable results for regression depending on spline family, knot strategy, and backbone.
ProfiliTable is a profiling-driven multi-agent system that builds semantic context through exploration and closed-loop refinement to produce more reliable tabular data transformations than prior LLM approaches.
A data-centric AI framework cleans FLIm labels via confident learning and achieves 96% accuracy classifying glioma infiltration into low, moderate, and high cellularity.
Standalone tree-based models outperform both SAINT and SAINT-embedding hybrids for employee attrition prediction on tabular HR data.
citing papers explorer
-
Data Language Models: A New Foundation Model Class for Tabular Data
Schema-1 is the first Data Language Model that natively understands raw tabular data and outperforms gradient-boosted ensembles, AutoML, and prior tabular foundation models on row-level prediction and imputation tasks.
-
Metric-agnostic Learning-to-Rank via Boosting and Rank Approximation
A new listwise learning-to-rank method uses smooth rank approximation and boosting to optimize without depending on a single metric.
-
From Uniform to Learned Knots: A Study of Spline-Based Numerical Encodings for Tabular Deep Learning
Spline encodings for numerical features show task-dependent performance in tabular deep learning, with piecewise-linear encoding robust for classification and variable results for regression depending on spline family, knot strategy, and backbone.
-
ProfiliTable: Profiling-Driven Tabular Data Processing via Agentic Workflows
ProfiliTable is a profiling-driven multi-agent system that builds semantic context through exploration and closed-loop refinement to produce more reliable tabular data transformations than prior LLM approaches.
-
A Data-Centric Framework for Intraoperative Fluorescence Lifetime Imaging for Glioma Surgical Guidance
A data-centric AI framework cleans FLIm labels via confident learning and achieves 96% accuracy classifying glioma infiltration into low, moderate, and high cellularity.
-
Integrating SAINT with Tree-Based Models: A Case Study in Employee Attrition Prediction
Standalone tree-based models outperform both SAINT and SAINT-embedding hybrids for employee attrition prediction on tabular HR data.