CleanPatrick: A Benchmark for Image Data Cleaning

Alexander A. Navarini; Alvaro Gonzalez-Jimenez; Arash Koochek; Elisabeth Victoria Goessinger; Fabian Gr\"oger; Hanna Lindemann; Ludovic Amruthalingam; Marc Pouly; Marie Bargiela; Marie Hofbauer

arxiv: 2505.11034 · v2 · pith:RTFKQRIPnew · submitted 2025-05-16 · 💻 cs.CV · cs.AI· cs.LG

CleanPatrick: A Benchmark for Image Data Cleaning

Fabian Gr\"oger , Simone Lionetti , Philippe Gottfrois , Alvaro Gonzalez-Jimenez , Ludovic Amruthalingam , Elisabeth Victoria Goessinger , Hanna Lindemann , Marie Bargiela

show 7 more authors

Marie Hofbauer Omar Badri Philipp Tschandl Arash Koochek Matthew Groh Alexander A. Navarini Marc Pouly

This is my paper

classification 💻 cs.CV cs.AIcs.LG

keywords cleanpatrickdatabenchmarkcleaningdetectionimageclassicalcomparison

0 comments

read the original abstract

Robust machine learning depends on clean data, yet current image data cleaning benchmarks rely on synthetic noise or narrow human studies, limiting comparison and real-world relevance. We introduce CleanPatrick, the first large-scale benchmark for data cleaning in the image domain, built upon the publicly available Fitzpatrick17k dermatology dataset. We collect 496,377 binary annotations from 933 medical crowd workers, identify off-topic samples (4%), near-duplicates (21%), and label errors (32%), and employ an aggregation model inspired by item-response theory followed by expert review to derive high-quality ground truth. CleanPatrick formalizes issue detection as a ranking task and employs standard ranking metrics that mirror real audit workflows. We benchmark classical anomaly detectors, perceptual hashing, SSIM, Confident Learning, NoiseRank, FINE, BHN, and SelfClean. On CleanPatrick, self-supervised representations excel at near-duplicate detection, classical methods achieve competitive off-topic detection under constrained review budgets, and detecting implausible labels under conservative human judgment remains challenging for fine-grained medical classification. By releasing both the dataset and the evaluation framework, CleanPatrick enables a systematic comparison of image-cleaning strategies.

This paper has not been read by Pith yet.

CleanPatrick: A Benchmark for Image Data Cleaning

discussion (0)