pith. sign in

arxiv: 2505.11034 · v2 · pith:RTFKQRIPnew · submitted 2025-05-16 · 💻 cs.CV · cs.AI· cs.LG

CleanPatrick: A Benchmark for Image Data Cleaning

classification 💻 cs.CV cs.AIcs.LG
keywords cleanpatrickdatabenchmarkcleaningdetectionimageclassicalcomparison
0
0 comments X
read the original abstract

Robust machine learning depends on clean data, yet current image data cleaning benchmarks rely on synthetic noise or narrow human studies, limiting comparison and real-world relevance. We introduce CleanPatrick, the first large-scale benchmark for data cleaning in the image domain, built upon the publicly available Fitzpatrick17k dermatology dataset. We collect 496,377 binary annotations from 933 medical crowd workers, identify off-topic samples (4%), near-duplicates (21%), and label errors (32%), and employ an aggregation model inspired by item-response theory followed by expert review to derive high-quality ground truth. CleanPatrick formalizes issue detection as a ranking task and employs standard ranking metrics that mirror real audit workflows. We benchmark classical anomaly detectors, perceptual hashing, SSIM, Confident Learning, NoiseRank, FINE, BHN, and SelfClean. On CleanPatrick, self-supervised representations excel at near-duplicate detection, classical methods achieve competitive off-topic detection under constrained review budgets, and detecting implausible labels under conservative human judgment remains challenging for fine-grained medical classification. By releasing both the dataset and the evaluation framework, CleanPatrick enables a systematic comparison of image-cleaning strategies.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.