FORGE reformulates molecular optimization as context-aware fragment ranking and replacement using mined low-to-high edit pairs, outperforming larger language models and graph methods on standard benchmarks.
citation dossier
Tabpfn: A transformer that solves small tabu- lar classification problems in a second
why this work matters in Pith
Pith has found this work in 18 reviewed papers. Its strongest current cluster is cs.LG (9 papers). The largest review-status bucket among citing papers is UNVERDICTED (17 papers). For highly cited works, this page shows a dossier first and a bounded explorer second; it never tries to render every citing paper at once.
years
2026 18representative citing papers
Forecast loss differentials are reframed as returns and assessed with risk-adjusted finance metrics, showing professional forecasters are harder to beat on risk-adjusted performance than on raw accuracy in US macro forecasting.
Schema-1 is the first Data Language Model that natively understands raw tabular data and outperforms gradient-boosted ensembles, AutoML, and prior tabular foundation models on row-level prediction and imputation tasks.
TFM-Retouche is an architecture-agnostic input-space residual adapter that improves tabular foundation model accuracy on 51 datasets by learning input corrections through the frozen backbone, with an identity guard to fall back to the original model.
PHBench shows Product Hunt launch signals predict Series A funding with an ensemble model reaching AP 0.037 and F0.5 0.097 on blind test data, outperforming logistic regression and zero-shot LLMs.
Time series foundation models match the performance of specialized models for day-ahead load forecasting while providing explanations that match domain knowledge on weather and calendar effects.
TabDistill distills feature interactions from tabular foundation models via post-hoc attribution and inserts them into GAMs, yielding consistent predictive gains.
The authors release the first Slovene ESG sentiment dataset from news and report that large language models lead on environmental and social classification while fine-tuned SloBERTa performs best on governance.
LGB+ improves macroeconomic forecasts by letting linear basis functions compete with or alternate against tree updates inside gradient boosting, yielding native linear/nonlinear decomposition of predictions.
CarCrashNet releases a large-scale open benchmark dataset of structural crash simulations and a hierarchical neural solver for data-driven full-vehicle crash prediction.
ModelLens learns a performance-aware latent space from 1.62M leaderboard records to rank unseen models on unseen datasets without forward passes on the target.
Decoupled PFNs use controllable synthetic priors to train separate latent-signal and noise heads, making epistemic-aleatoric decomposition identifiable and improving acquisition in noisy settings.
Tabular foundation models achieve high accuracy in molecular property prediction through in-context learning, with up to 100% win rates on MoleculeACE tasks when paired with CheMeleon embeddings.
ReSS uses decision-tree scaffolds to fine-tune LLMs for faithful tabular reasoning, reporting up to 10% gains over baselines on medical and financial data.
Spline encodings for numerical features show task-dependent performance in tabular deep learning, with piecewise-linear encoding robust for classification and variable results for regression depending on spline family, knot strategy, and backbone.
TabPFN reaches AUC 0.892 for 3-year MCI-to-AD conversion on TADPOLE data and holds performance at N=50 training samples where XGBoost, Random Forest, LightGBM, and logistic regression degrade.
TabPFNv2.5 delivers 40x faster inference than Random Forest at 97% binary accuracy on TON IoT data, enabling a hybrid pipeline for real-time IoT threat screening in smart cities.
TabPFN maintains high ROC-AUC and structured attention under controlled additions of irrelevant features, nonlinear correlations, and mislabeled targets in binary classification.
citing papers explorer
-
FORGE: Fragment-Oriented Ranking and Generation for Context-Aware Molecular Optimization
FORGE reformulates molecular optimization as context-aware fragment ranking and replacement using mined low-to-high edit pairs, outperforming larger language models and graph methods on standard benchmarks.
-
Quantifying the Risk-Return Tradeoff in Forecasting
Forecast loss differentials are reframed as returns and assessed with risk-adjusted finance metrics, showing professional forecasters are harder to beat on risk-adjusted performance than on raw accuracy in US macro forecasting.
-
Data Language Models: A New Foundation Model Class for Tabular Data
Schema-1 is the first Data Language Model that natively understands raw tabular data and outperforms gradient-boosted ensembles, AutoML, and prior tabular foundation models on row-level prediction and imputation tasks.
-
TFM-Retouche: A Lightweight Input-Space Adapter for Tabular Foundation Models
TFM-Retouche is an architecture-agnostic input-space residual adapter that improves tabular foundation model accuracy on 51 datasets by learning input corrections through the frozen backbone, with an identity guard to fall back to the original model.
-
PHBench: A Benchmark for Predicting Startup Series A Funding from Product Hunt Launch Signals
PHBench shows Product Hunt launch signals predict Series A funding with an ensemble model reaching AP 0.037 and F0.5 0.097 on blind test data, outperforming logistic regression and zero-shot LLMs.
-
Explainable Load Forecasting with Covariate-Informed Time Series Foundation Models
Time series foundation models match the performance of specialized models for day-ahead load forecasting while providing explanations that match domain knowledge on weather and calendar effects.
-
Selecting Feature Interactions for Generalized Additive Models by Distilling Foundation Models
TabDistill distills feature interactions from tabular foundation models via post-hoc attribution and inserts them into GAMs, yielding consistent predictive gains.
-
Environmental, Social and Governance Sentiment Analysis on Slovene News: A Novel Dataset and Models
The authors release the first Slovene ESG sentiment dataset from news and report that large language models lead on environmental and social classification while fine-tuned SloBERTa performs best on governance.
-
LGB+: A Macroeconomic Forecasting Road Test
LGB+ improves macroeconomic forecasts by letting linear basis functions compete with or alternate against tree updates inside gradient boosting, yielding native linear/nonlinear decomposition of predictions.
-
CarCrashNet: A Large-Scale Dataset and Hierarchical Neural Solver for Data-Driven Structural Crash Simulation
CarCrashNet releases a large-scale open benchmark dataset of structural crash simulations and a hierarchical neural solver for data-driven full-vehicle crash prediction.
-
ModelLens: Finding the Best for Your Task from Myriads of Models
ModelLens learns a performance-aware latent space from 1.62M leaderboard records to rank unseen models on unseen datasets without forward passes on the target.
-
Decoupled PFNs: Identifiable Epistemic-Aleatoric Decomposition via Structured Synthetic Priors
Decoupled PFNs use controllable synthetic priors to train separate latent-signal and noise heads, making epistemic-aleatoric decomposition identifiable and improving acquisition in noisy settings.
-
Tabular foundation models for in-context prediction of molecular properties
Tabular foundation models achieve high accuracy in molecular property prediction through in-context learning, with up to 100% win rates on MoleculeACE tasks when paired with CheMeleon embeddings.
-
ReSS: Learning Reasoning Models for Tabular Data Prediction via Symbolic Scaffold
ReSS uses decision-tree scaffolds to fine-tune LLMs for faithful tabular reasoning, reporting up to 10% gains over baselines on medical and financial data.
-
From Uniform to Learned Knots: A Study of Spline-Based Numerical Encodings for Tabular Deep Learning
Spline encodings for numerical features show task-dependent performance in tabular deep learning, with piecewise-linear encoding robust for classification and variable results for regression depending on spline family, knot strategy, and backbone.
-
Evaluating TabPFN for Mild Cognitive Impairment to Alzheimer's Disease Conversion in Data Limited Settings
TabPFN reaches AUC 0.892 for 3-year MCI-to-AD conversion on TADPOLE data and holds performance at N=50 training samples where XGBoost, Random Forest, LightGBM, and logistic regression degrade.
-
Optimizing IoT Intrusion Detection with Tabular Foundation Models for Smart City Forensics
TabPFNv2.5 delivers 40x faster inference than Random Forest at 97% binary accuracy on TON IoT data, enabling a hybrid pipeline for real-time IoT threat screening in smart cities.
-
Noise Immunity in In-Context Tabular Learning: An Empirical Robustness Analysis of TabPFN's Attention Mechanisms
TabPFN maintains high ROC-AUC and structured attention under controlled additions of irrelevant features, nonlinear correlations, and mislabeled targets in binary classification.