hub

CatBoost: gradient boosting with categorical features support

Dorogush, A · 2018 · cs.LG · arXiv 1810.11363

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

open full Pith review browse 12 citing papers arXiv PDF

abstract

In this paper we present CatBoost, a new open-sourced gradient boosting library that successfully handles categorical features and outperforms existing publicly available implementations of gradient boosting in terms of quality on a set of popular publicly available datasets. The library has a GPU implementation of learning algorithm and a CPU implementation of scoring algorithm, which are significantly faster than other gradient boosting libraries on ensembles of similar sizes.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

method 2

citation-polarity summary

use method 2

representative citing papers

TabICL: A Tabular Foundation Model for In-Context Learning on Large Data

cs.LG · 2025-02-08 · unverdicted · novelty 6.0

TabICL scales in-context learning to large tabular data via column-then-row attention for row embeddings followed by a transformer, matching TabPFNv2 speed and performance while outperforming it and CatBoost on datasets over 10K samples.

Context-Aware Web Attack Detection in Open-Source SIEM Systems via MITRE ATT&CK-Enriched Behavioral Profiling

cs.CR · 2026-05-13 · conditional · novelty 5.0

Smart-SIEM adds context-aware ML profiling to Wazuh SIEM, lifting binary attack detection F1 to 0.967 and six-class categorization to 0.914 while recovering from concept drift via retraining.

Comparative analysis of missing data imputation methods for CSST survey: Impact on photometric redshift estimation performance

astro-ph.GA · 2026-05-13 · conditional · novelty 5.0

KNN imputation gives highest photo-z accuracy under ideal random missingness with complete training data, while SAITS is more robust for incomplete training sets and realistic mixed missingness patterns in CSST data.

AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data

stat.ML · 2020-03-13 · unverdicted · novelty 5.0

AutoGluon-Tabular achieves superior accuracy on tabular classification and regression by multi-layer model ensembling and stacking, outperforming other AutoML frameworks on 50 benchmarks and Kaggle competitions.

Mind the Gap? A Distributional Comparison of Real and Synthetic Priors for Tabular Foundation Models

cs.AI · 2026-05-07 · unverdicted · novelty 5.0

The synthetic prior for tabular foundation models covers only a narrow part of real table distributions, but this mismatch does not degrade model generalization.

A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification

cs.LG · 2021-07-15 · unverdicted · novelty 5.0

Pith review generated a malformed one-line summary.

A Value-added Physical Properties Catalog for Low-redshift Galaxies from DESI Legacy Imaging Surveys DR10

astro-ph.GA · 2026-05-19 · unverdicted · novelty 4.0

A multimodal neural network trained on MPA-JHU references produces SFR, stellar mass, and metallicity estimates for 547 million low-redshift galaxies in DESI LS DR10.

Community-Based Early-Stage Chronic Kidney Disease Screening using Explainable Machine Learning for Low-Resource Settings

cs.LG · 2026-01-03 · unverdicted · novelty 4.0

Machine learning models trained on Bangladeshi community data achieve 89-90% balanced accuracy for early CKD detection using few accessible features, outperforming traditional screening tools and generalizing across external datasets from India, UAE, and Bangladesh.

Search for quasar pairs with Gaia astrometric data. II. Photometric redshift prediction with machine learning for the MGQPC catalogue

astro-ph.GA · 2026-05-10 · conditional · novelty 4.0

Machine learning models achieve NMAD 0.036 and 5.6% outliers for quasar photometric redshifts, identifying 185 high-probability pair candidates in MGQPC with 20 spectroscopically confirmed as physical pairs.

Donor-Aware scRNA-seq Benchmarks for IBD Classification

q-bio.QM · 2026-05-05 · unverdicted · novelty 4.0

Donor-aware benchmarks show AUROCs up to 0.978 for IBD classification from scRNA-seq using CLR cell-type compositions and GatedStructuralCFN embeddings, with compartment stratification improving both performance and feature stability.

From Canopy to Collision: A Hybrid Predictive Framework for Identifying Risk Factors in Tree-Involved Traffic Crashes

cs.LG · 2026-04-25 · unverdicted · novelty 4.0

Hybrid predictive modeling of crash data identifies non-use of restraints as the primary risk factor for severe injury in collisions involving trees.

Fashion Retail: Forecasting Demand for New Items

cs.OH · 2019-06-27 · unverdicted · novelty 3.0

Generalized ML models trained on past sales data forecast demand for new fashion items from their attributes, with experiments across neural architectures and loss functions showing robust performance.

citing papers explorer

Showing 12 of 12 citing papers.

TabICL: A Tabular Foundation Model for In-Context Learning on Large Data cs.LG · 2025-02-08 · unverdicted · none · ref 121 · internal anchor
TabICL scales in-context learning to large tabular data via column-then-row attention for row embeddings followed by a transformer, matching TabPFNv2 speed and performance while outperforming it and CatBoost on datasets over 10K samples.
Context-Aware Web Attack Detection in Open-Source SIEM Systems via MITRE ATT&CK-Enriched Behavioral Profiling cs.CR · 2026-05-13 · conditional · none · ref 23 · internal anchor
Smart-SIEM adds context-aware ML profiling to Wazuh SIEM, lifting binary attack detection F1 to 0.967 and six-class categorization to 0.914 while recovering from concept drift via retraining.
Comparative analysis of missing data imputation methods for CSST survey: Impact on photometric redshift estimation performance astro-ph.GA · 2026-05-13 · conditional · none · ref 47 · internal anchor
KNN imputation gives highest photo-z accuracy under ideal random missingness with complete training data, while SAITS is more robust for incomplete training sets and realistic mixed missingness patterns in CSST data.
AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data stat.ML · 2020-03-13 · unverdicted · none · ref 1 · internal anchor
AutoGluon-Tabular achieves superior accuracy on tabular classification and regression by multi-layer model ensembling and stacking, outperforming other AutoML frameworks on 50 benchmarks and Kaggle competitions.
Mind the Gap? A Distributional Comparison of Real and Synthetic Priors for Tabular Foundation Models cs.AI · 2026-05-07 · unverdicted · none · ref 10
The synthetic prior for tabular foundation models covers only a narrow part of real table distributions, but this mismatch does not degrade model generalization.
A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification cs.LG · 2021-07-15 · unverdicted · none · ref 30
Pith review generated a malformed one-line summary.
A Value-added Physical Properties Catalog for Low-redshift Galaxies from DESI Legacy Imaging Surveys DR10 astro-ph.GA · 2026-05-19 · unverdicted · none · ref 116 · internal anchor
A multimodal neural network trained on MPA-JHU references produces SFR, stellar mass, and metallicity estimates for 547 million low-redshift galaxies in DESI LS DR10.
Community-Based Early-Stage Chronic Kidney Disease Screening using Explainable Machine Learning for Low-Resource Settings cs.LG · 2026-01-03 · unverdicted · none · ref 68 · internal anchor
Machine learning models trained on Bangladeshi community data achieve 89-90% balanced accuracy for early CKD detection using few accessible features, outperforming traditional screening tools and generalizing across external datasets from India, UAE, and Bangladesh.
Search for quasar pairs with Gaia astrometric data. II. Photometric redshift prediction with machine learning for the MGQPC catalogue astro-ph.GA · 2026-05-10 · conditional · none · ref 21
Machine learning models achieve NMAD 0.036 and 5.6% outliers for quasar photometric redshifts, identifying 185 high-probability pair candidates in MGQPC with 20 spectroscopically confirmed as physical pairs.
Donor-Aware scRNA-seq Benchmarks for IBD Classification q-bio.QM · 2026-05-05 · unverdicted · none · ref 7
Donor-aware benchmarks show AUROCs up to 0.978 for IBD classification from scRNA-seq using CLR cell-type compositions and GatedStructuralCFN embeddings, with compartment stratification improving both performance and feature stability.
From Canopy to Collision: A Hybrid Predictive Framework for Identifying Risk Factors in Tree-Involved Traffic Crashes cs.LG · 2026-04-25 · unverdicted · none · ref 48
Hybrid predictive modeling of crash data identifies non-use of restraints as the primary risk factor for severe injury in collisions involving trees.
Fashion Retail: Forecasting Demand for New Items cs.OH · 2019-06-27 · unverdicted · none · ref 6 · internal anchor
Generalized ML models trained on past sales data forecast demand for new fashion items from their attributes, with experiments across neural architectures and loss functions showing robust performance.

CatBoost: gradient boosting with categorical features support

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer