TabArena launches a dynamic, updatable benchmarking system for tabular ML that shows boosted trees remain competitive, deep learning matches them under larger budgets with ensembling, foundation models excel on small data, and cross-model ensembles advance SOTA while flagging validation overfitting.
hub Mixed citations
Statlog (German Credit Data)
Mixed citation behavior. Most common role is background (60%).
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Tabular foundation models excel on tiny- to medium-sized IID data but are outperformed by traditional tree-based and deep learning models on non-IID, large, and high-dimensional datasets, based on evaluations across 11 models and 142 datasets in the new BeyondArena benchmark.
Face-Feature Tuning is a label-free logit remapping method that reduces FPR/TPR gaps across groups in deepfake detection while preserving overall accuracy.
Concatenated MIA score evaluation is uncalibrated across per-sample FPRs and the efficient LiRA has finite population bias; a post-processing calibration is proposed.
Energy shields are adaptive probabilistic controllers using energy functions to ensure runtime fairness with short-term safety and long-term liveness guarantees.
Concordia aligns synthetic table generation with federated validation utility via client-side utility scorers and group-relative policy optimization to improve LLM adaptation on non-IID tabular tasks.
SOGAR learns Pareto-optimal recourse summaries by solving a bi-objective decision tree optimization that partitions populations and assigns shared low-cost actions per subgroup.
Introduces learning-augmented robust algorithmic recourse that trades off consistency with accurate future-model predictions against robustness to inaccurate predictions via a novel algorithm.
TIRA attacks with PMiS and PRSMP push fairness metrics to ideal values and reduce SHAP attribution for protected features to zero in black-box settings.
P²CE is a model-agnostic algorithm for plausible Pareto-optimal counterfactual explanations that uses isolation forest for plausibility and SHAP for efficiency, claiming better quality and speed on three datasets.
Introduces IFSC framework modeling peer imitation in individual fairness-aware strategic classification to improve fairness consistency under interdependent manipulations.
TabChange produces more proximal and valid counterfactuals on tabular data by relationship-based flipping or adversarial latent-space attribute removal compared to baselines on seven datasets.
Defines behaviorally realistic strategic classification and proposes Pro-SF, a prospect-theory-based framework to model non-rational agent manipulations in strategic classification.
ProF repairs DNNs for individual fairness by using interval bound propagation to bound outputs over input sets and solving a MILP to adjust the model with guarantees on those sets.
BoostLLM trains sequential PEFT adapters in a boosting framework with tree path inputs to improve LLM performance on few-shot tabular classification, matching or exceeding XGBoost.
AML outperforms cross-validated baselines including CNNs on 50-2000 example image datasets and is comparable to XGBoost/LightGBM on tabular data using only training data and no task-dependent hyperparameters.
Large-scale review of 5300 AI incident reports shows harms are amplified up to three times at specific intersections including adolescent girls, lower-class people of color, and upper-class political elites.
CAFP averages a classifier's outputs on each input and its counterfactual with the protected attribute flipped, eliminating direct dependence on the attribute and achieving demographic parity under mild assumptions.
HNPclassifier is an R package that implements H-NP umbrella algorithms for high-probability control of under-classification errors in ordered multi-class classification tasks.
citing papers explorer
-
TabArena: A Living Benchmark for Machine Learning on Tabular Data
TabArena launches a dynamic, updatable benchmarking system for tabular ML that shows boosted trees remain competitive, deep learning matches them under larger budgets with ensembling, foundation models excel on small data, and cross-model ensembles advance SOTA while flagging validation overfitting.
-
Beyond IID: How General Are Tabular Foundation Models, Really?
Tabular foundation models excel on tiny- to medium-sized IID data but are outperformed by traditional tree-based and deep learning models on non-IID, large, and high-dimensional datasets, based on evaluations across 11 models and 142 datasets in the new BeyondArena benchmark.
-
Toward Calibrated, Fair, and accurate Deepfake Detection
Face-Feature Tuning is a label-free logit remapping method that reduces FPR/TPR gaps across groups in deepfake detection while preserving overall accuracy.
-
On Reliability of Efficient Membership Inference Vulnerability Evaluation
Concatenated MIA score evaluation is uncalibrated across per-sample FPRs and the efficient LiRA has finite population bias; a post-processing calibration is proposed.
-
Energy Shields for Fairness
Energy shields are adaptive probabilistic controllers using energy functions to ensure runtime fairness with short-term safety and long-term liveness guarantees.
-
Concordia: Self-Improving Synthetic Tables for Federated LLMs
Concordia aligns synthetic table generation with federated validation utility via client-side utility scorers and group-relative policy optimization to improve LLM adaptation on non-IID tabular tasks.
-
Optimal Recourse Summaries via Bi-Objective Decision Tree Learning
SOGAR learns Pareto-optimal recourse summaries by solving a bi-objective decision tree optimization that partitions populations and assigns shared low-cost actions per subgroup.
-
Learning-Augmented Robust Algorithmic Recourse
Introduces learning-augmented robust algorithmic recourse that trades off consistency with accurate future-model predictions against robustness to inaccurate predictions via a novel algorithm.
-
The Unseen Hand: Manipulating Model Fairness and SHAP with Targeted Identity Re-Association Attacks
TIRA attacks with PMiS and PRSMP push fairness metrics to ideal values and reduce SHAP attribution for protected features to zero in black-box settings.
-
P$^2$CE: Model-Agnostic Plausible Pareto-Optimal Counterfactual Explanations
P²CE is a model-agnostic algorithm for plausible Pareto-optimal counterfactual explanations that uses isolation forest for plausibility and SHAP for efficiency, claiming better quality and speed on three datasets.
-
Beyond Independent Manipulation: Individual Fairness-aware Strategic Classification with Peer Imitation
Introduces IFSC framework modeling peer imitation in individual fairness-aware strategic classification to improve fairness consistency under interdependent manipulations.
-
TabChange: Precise Attribute Changes in Tabular Data
TabChange produces more proximal and valid counterfactuals on tabular data by relationship-based flipping or adversarial latent-space attribute removal compared to baselines on seven datasets.
-
Beyond Rational Illusion: Behaviorally Realistic Strategic Classification
Defines behaviorally realistic strategic classification and proposes Pro-SF, a prospect-theory-based framework to model non-rational agent manipulations in strategic classification.
-
Provable Fairness Repair for Deep Neural Networks
ProF repairs DNNs for individual fairness by using interval bound propagation to bound outputs over input sets and solving a MILP to adjust the model with guarantees on those sets.
-
BoostLLM: Boosting-inspired LLM Fine-tuning for Few-shot Tabular Classification
BoostLLM trains sequential PEFT adapters in a boosting framework with tree path inputs to improve LLM performance on few-shot tabular classification, matching or exceeding XGBoost.
-
Algebraic Machine Learning for Small-to-Medium Datasets Is Competitive against Strong Standard Baselines
AML outperforms cross-validated baselines including CNNs on 50-2000 example image datasets and is comparable to XGBoost/LightGBM on tabular data using only training data and no task-dependent hyperparameters.
-
Why AI Harms Can't Be Fixed One Identity at a Time: What 5300 Incident Reports Reveal About Intersectionality
Large-scale review of 5300 AI incident reports shows harms are amplified up to three times at specific intersections including adolescent girls, lower-class people of color, and upper-class political elites.
-
CAFP: A Post-Processing Framework for Group Fairness via Counterfactual Model Averaging
CAFP averages a classifier's outputs on each input and its counterfactual with the protected attribute flipped, eliminating direct dependence on the attribute and achieving demographic parity under mild assumptions.
-
HNPclassifier: An R Package for Hierarchical Neyman-Pearson Classification
HNPclassifier is an R package that implements H-NP umbrella algorithms for high-probability control of under-classification errors in ordered multi-class classification tasks.