Recognition: unknown
CatBoost: unbiased boosting with categorical features
read the original abstract
This paper presents the key algorithmic techniques behind CatBoost, a new gradient boosting toolkit. Their combination leads to CatBoost outperforming other publicly available boosting implementations in terms of quality on a variety of datasets. Two critical algorithmic advances introduced in CatBoost are the implementation of ordered boosting, a permutation-driven alternative to the classic algorithm, and an innovative algorithm for processing categorical features. Both techniques were created to fight a prediction shift caused by a special kind of target leakage present in all currently existing implementations of gradient boosting algorithms. In this paper, we provide a detailed analysis of this problem and demonstrate that proposed algorithms solve it effectively, leading to excellent empirical results.
This paper has not been read by Pith yet.
Forward citations
Cited by 6 Pith papers
-
TFM-Retouche: A Lightweight Input-Space Adapter for Tabular Foundation Models
TFM-Retouche is an architecture-agnostic input-space residual adapter that improves tabular foundation model accuracy on 51 datasets by learning input corrections through the frozen backbone, with an identity guard to...
-
WOODELF-HD: Efficient Background SHAP for High-Depth Decision Trees
WoodelfHD reduces Background SHAP preprocessing for decision trees from 3^D to 2^D complexity, enabling exact computation on depths up to 21 with reported speedups of 33x to 162x.
-
TFM-Retouche: A Lightweight Input-Space Adapter for Tabular Foundation Models
TFM-Retouche is an input-space residual adapter that lifts TabICLv2 performance by 56 Elo points on 51 tabular datasets while remaining architecture-agnostic and computationally light.
-
MuViS: Multimodal Virtual Sensing Benchmark
MuViS is a new unified benchmark showing that neither gradient-boosted trees nor deep neural networks hold a universal advantage in multimodal virtual sensing.
-
RelAgent: LLM Agents as Data Scientists for Relational Learning
RelAgent uses an LLM agent to autonomously generate SQL feature programs paired with classical models for interpretable relational learning predictions that execute efficiently on standard databases.
-
Accelerating the Design of Resorbable Magnesium Alloys: A Machine Learning Approach to Property Prediction
CatBoost and other ensemble ML models achieve R² scores of 0.95, 0.916, and 0.903 on yield strength, ultimate tensile strength, and elongation for resorbable Mg alloys, with SHAP analysis highlighting processing condi...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.