Automatic Dataset Construction (ADC): Sample Collection, Data Curation, and Beyond

Ankit Shah; Haobo Wang; Hao Chen; Haoyu Wang; Hengxiang Zhang; Hongxin Wei; James Davis; Jiaheng Wei; Jindong Wang; Jinlong Pang

arxiv: 2408.11338 · v2 · submitted 2024-08-21 · 💻 cs.AI · cs.LG

Automatic Dataset Construction (ADC): Sample Collection, Data Curation, and Beyond

Minghao Liu , Zonglin Di , Jiaheng Wei , Zhongruo Wang , Hengxiang Zhang , Ruixuan Xiao , Haoyu Wang , Jinlong Pang

show 10 more authors

Hao Chen Ankit Shah Hongxin Wei Xinlei He Zhaowei Zhao Haobo Wang Lei Feng Jindong Wang James Davis Yang Liu

This is my paper

Pith reviewed 2026-05-23 21:44 UTC · model grok-4.3

classification 💻 cs.AI cs.LG

keywords automatic dataset constructionLLMimage classificationlabel noisedata curationsearch enginesclass imbalanceClothing-ADC

0 comments

The pith

LLMs design classes and generate code to collect and curate over a million images via search engines, reaching 79 percent agreement with humans while halving label noise.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Automatic Dataset Construction as a method that uses large language models to define fine-grained classes, write collection code, pull samples from search engines, and then curate the results automatically. This targets the high cost and error rates of manual dataset building for tasks such as image classification. The authors demonstrate the approach at scale by releasing Clothing-ADC, a dataset of more than one million images across twelve coarse classes and twelve thousand fine subclasses. Automated curation matches human labels 79 percent of the time and cuts label noise from 22.2 percent to 10.7 percent. The work also supplies software for handling remaining noise and bias plus three new benchmark datasets for studying label noise detection and class imbalance.

Core claim

ADC leverages LLMs for the detailed class design and code generation to collect relevant samples via search engines, significantly reducing the need for manual annotation and speeding up the data generation process. To demonstrate ADC at scale, we construct Clothing-ADC: a dataset of over 1 million images spanning 12 main classes and 12,000 fine-grained subclasses. Our automated curation achieves 79% agreement with human annotators and reduces label noise from 22.2% to 10.7%.

What carries the argument

LLM-driven class design combined with generated code that queries search engines, followed by automated curation scripts.

If this is right

Datasets of million-image scale can be produced for new fine-grained classification problems with far less human labeling effort.
Open-source tools for detecting label errors and training under noise or imbalance become directly usable on the generated data.
Three new public benchmarks allow standardized testing of label-noise detection, noise-robust learning, and class-imbalance methods.
Existing popular methods for these three problems can be re-evaluated on the released benchmarks to measure current performance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same pipeline could be applied to non-image modalities if suitable search APIs exist for text, audio, or video.
Faster dataset iteration might accelerate specialized model fine-tuning in domains where labeled data has historically been scarce.
Search-engine biases could still limit diversity even when label noise is reduced, suggesting a need for diversity audits on the collected samples.

Load-bearing premise

Search engine results supply samples that are relevant enough for LLM-generated curation to reach high agreement with humans without introducing large systematic biases.

What would settle it

Running the full ADC pipeline on a held-out visual domain and finding that human agreement falls below 60 percent or that label noise stays above 15 percent after curation.

Figures

Figures reproduced from arXiv: 2408.11338 by Ankit Shah, Haobo Wang, Hao Chen, Haoyu Wang, Hengxiang Zhang, Hongxin Wei, James Davis, Jiaheng Wei, Jindong Wang, Jinlong Pang, Lei Feng, Minghao Liu, Ruixuan Xiao, Xinlei He, Yang Liu, Zhaowei Zhao, Zhongruo Wang, Zonglin Di.

**Figure 1.** Figure 1: Comparisons of key steps in dataset construction. In Step 1: Dataset design, ADC utilizes LLMs to search the field and provide instant feedback, unlike traditional methods that rely on manual creation of class names and refine through crowdsourced worker feedback. In Step 2: Labeling, ADC reduces human workload by flipping the data collection process, using targets to search for samples. In Step 3: Cleanin… view at source ↗

**Figure 2.** Figure 2: Example of using LLM for iterative refinement of attribute designs. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Samples from the collected Clothing-ADC Dataset [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Long-tailed data distribution is a prevalent issue in many datasets. Searching “wool suit" in Google image [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Collection of Clothing-ADC test set: A filtering task to the worker instead of annotation from scratch. [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

**Figure 6.** Figure 6: The behaviors of workers in the creation of test set. [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗

**Figure 7.** Figure 7: Label noise evaluation worker page (a) Distribution of the HITs completed per worker (b) Distribution of work time in seconds per HIT [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

**Figure 8.** Figure 8: The behaviors of workers in the collection of label noise evaluation. [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗

read the original abstract

Large-scale data collection is essential for developing personalized training data, mitigating the shortage of training data, and fine-tuning specialized models. However, creating high-quality datasets quickly and accurately remains a challenge due to annotation errors, the substantial time and costs associated with human labor. To address these issues, we propose Automatic Dataset Construction (ADC), an innovative methodology that automates dataset creation with negligible cost and high efficiency. Taking the image classification task as a starting point, ADC leverages LLMs for the detailed class design and code generation to collect relevant samples via search engines, significantly reducing the need for manual annotation and speeding up the data generation process. To demonstrate ADC at scale, we construct Clothing-ADC: a dataset of over 1 million images spanning 12 main classes and 12,000 fine-grained subclasses. Our automated curation achieves 79\% agreement with human annotators and reduces label noise from 22.2\% to 10.7\%. Despite these advantages, ADC also encounters real-world challenges such as label errors (label noise) and imbalanced data distributions (label bias). We provide open-source software that incorporates existing methods for label error detection, robust learning under noisy and biased data, ensuring a higher-quality training data and more robust model training procedure. Furthermore, we design three benchmark datasets focused on label noise detection, label noise learning, and class-imbalanced learning. These datasets are vital because there are few existing datasets specifically for label noise detection, despite its importance. Finally, we evaluate the performance of existing popular methods on these datasets, thereby facilitating further research in the field.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ADC is a practical engineering pipeline for LLM-assisted construction of large fine-grained image datasets that ships code and three new benchmarks, but the quality evidence is moderate and the search-engine assumptions need tighter checks.

read the letter

ADC uses LLMs to design class hierarchies, generate search and curation code, then pulls samples from search engines to build Clothing-ADC: over a million images across 12 main classes and 12,000 fine-grained ones. The paper also releases open-source tools for noise and bias handling plus three new benchmark datasets aimed at label noise detection, noisy learning, and class-imbalanced learning. Those benchmarks address a real gap, since dedicated resources for noise detection are scarce. The reported numbers are 79% agreement with human annotators and a noise drop from 22.2% to 10.7%. The work is straightforward about the remaining problems of label errors and imbalance. The pipeline and the released artifacts are the clearest additions here. The main soft spot is the strength of the quality claims. 79% agreement is only moderate, and the paper does not spell out the size or sampling of the human validation set in the abstract-level description. Search results can carry class-specific biases or ranking artifacts that the generated curation code may not catch evenly at this scale. The authors note label bias as an open issue but do not show that the automated steps remove it without substantial further intervention. If the human checks were limited or post-selected, the headline efficiency story weakens. This is aimed at researchers who build specialized vision datasets or study robustness under noise and imbalance. The concrete datasets and code give it standalone value even if the core method is an incremental extension of existing LLM-for-data ideas. It deserves peer review because the resources are usable now and the limitations are stated plainly rather than hidden.

Referee Report

3 major / 1 minor

Summary. The paper proposes Automatic Dataset Construction (ADC), a pipeline that uses LLMs to design detailed classes and generate code for collecting image samples via search engines, with the goal of automating large-scale dataset creation at negligible cost. It demonstrates the approach by constructing Clothing-ADC (>1M images, 12 main classes, 12k fine-grained subclasses), reports that automated curation achieves 79% agreement with human annotators and reduces label noise from 22.2% to 10.7%, releases open-source software for label-error detection and robust learning under noise/bias, and introduces three new benchmark datasets for label-noise detection, noise-robust learning, and class-imbalanced learning together with evaluations of existing methods on them.

Significance. If the reported agreement rates and noise reductions are shown to generalize under transparent validation protocols, ADC could meaningfully accelerate the creation of fine-grained labeled datasets and reduce reliance on manual annotation. The released benchmarks and tooling for noisy/imbalanced settings would also provide concrete resources for research on robust learning.

major comments (3)

[Abstract] Abstract: the headline claims of 79% human agreement and noise reduction from 22.2% to 10.7% are presented without any description of the validation subset size, how the subset was sampled, how label noise was measured before versus after curation, inter-annotator agreement among humans, or comparison to any baseline curation procedure; these omissions make it impossible to judge whether the automated step scales reliably to 12k fine-grained classes.
[Abstract] Abstract: the central efficiency claim rests on the premise that search-engine results plus LLM-generated curation code yield samples whose quality is close to human without large-scale manual intervention, yet no analysis is supplied of class-specific relevance rates, search-ranking biases, or failure modes across the 12k subclasses.
[Abstract] Abstract: although label bias is listed as a remaining challenge, the manuscript provides no quantitative evidence that the automated pipeline reduces distributional imbalance at the scale of Clothing-ADC.

minor comments (1)

[Abstract] Abstract: the three new benchmark datasets are mentioned but not named, sized, or contrasted with existing resources, which would help readers assess their novelty.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each major comment point by point below. Where the abstract lacks necessary context, we agree that revisions are warranted to improve transparency.

read point-by-point responses

Referee: [Abstract] Abstract: the headline claims of 79% human agreement and noise reduction from 22.2% to 10.7% are presented without any description of the validation subset size, how the subset was sampled, how label noise was measured before versus after curation, inter-annotator agreement among humans, or comparison to any baseline curation procedure; these omissions make it impossible to judge whether the automated step scales reliably to 12k fine-grained classes.

Authors: The referee correctly identifies that the abstract omits these validation details. The main text describes the human study used to obtain the 79% agreement figure and the before/after noise rates, including sampling from the Clothing-ADC collection and the procedure for measuring label noise via human labels. We will revise the abstract to include a concise statement on subset size, sampling approach, and inter-annotator agreement. A comparison against a baseline curation procedure appears in the experimental results. revision: yes
Referee: [Abstract] Abstract: the central efficiency claim rests on the premise that search-engine results plus LLM-generated curation code yield samples whose quality is close to human without large-scale manual intervention, yet no analysis is supplied of class-specific relevance rates, search-ranking biases, or failure modes across the 12k subclasses.

Authors: We agree that class-specific relevance rates and explicit failure-mode analysis across all 12k subclasses are not reported. The paper supplies only aggregate relevance and noise statistics because per-subclass human labeling at this scale was not feasible. We will add a sentence to the abstract acknowledging that search-ranking biases and subclass-level variation remain unquantified and are discussed as limitations. revision: partial
Referee: [Abstract] Abstract: although label bias is listed as a remaining challenge, the manuscript provides no quantitative evidence that the automated pipeline reduces distributional imbalance at the scale of Clothing-ADC.

Authors: The manuscript does not claim that ADC reduces distributional imbalance; it lists label bias as an open challenge. We will revise the abstract wording to make this explicit and to avoid any implication that quantitative evidence of bias reduction is provided. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical pipeline proposal with external human-agreement validation

full rationale

The paper presents ADC as a methodological pipeline for automated data collection and curation using LLMs and search engines. It reports concrete empirical results (79% human agreement, noise drop from 22.2% to 10.7%) on Clothing-ADC without any equations, derivations, fitted parameters renamed as predictions, or self-citation chains that reduce claims to inputs by construction. The validation metrics are measured against independent human annotators, satisfying the criterion for non-circular external support. No load-bearing steps match any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach relies primarily on domain assumptions about LLM capabilities and web data quality rather than introducing new free parameters or invented entities; no explicit fitted values or new postulated objects are described.

axioms (2)

domain assumption LLMs can generate accurate and relevant class designs and executable code for image collection via search engines.
This is invoked as the core mechanism for class design and sample collection in the abstract.
domain assumption Search engine results provide samples amenable to automated curation that achieves high human agreement.
Required for the claims of negligible cost, high efficiency, and the reported 79% agreement and noise reduction.

pith-pipeline@v0.9.0 · 5882 in / 1667 out tokens · 40500 ms · 2026-05-23T21:44:26.994349+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

ADC leverages LLMs for the detailed class design and code generation to collect relevant samples via search engines... automated curation achieves 79% agreement... reduces label noise from 22.2% to 10.7%.
IndisputableMonolith/Foundation/DimensionForcing.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We design three benchmark datasets focused on label noise detection, label noise learning, and class-imbalanced learning.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Supplement Generation Training for Enhancing Agentic Task Performance
cs.LG 2026-04 unverdicted novelty 4.0

SGT trains a lightweight model to generate task-specific supplemental text that improves performance of a larger frozen LLM on agentic tasks without modifying the large model.

Reference graph

Works this paper leans on

77 extracted references · 77 canonical work pages · cited by 1 Pith paper · 3 internal anchors

[1]

Leveraging large language models for decision support in personalized oncology

Manuela Benary, Xing David Wang, Max Schmidt, Dominik Soll, Georg Hilfenhaus, Mani Nassir, Christian Sigler, Maren Knödler, Ulrich Keller, Dieter Beule, et al. Leveraging large language models for decision support in personalized oncology. JAMA Network Open, 6(11):e2343689–e2343689, 2023

work page 2023
[2]

Autogen: A personalized large language model for academic enhancement—ethics and proof of principle

Sebastian Porsdam Mann, Brian D Earp, Nikolaj Møller, Suren Vynn, and Julian Savulescu. Autogen: A personalized large language model for academic enhancement—ethics and proof of principle. The American Journal of Bioethics, 23(10):28–41, 2023

work page 2023
[3]

Personalized large language models

Stanisław Wo´ zniak, Bartłomiej Koptyra, Arkadiusz Janz, Przemysław Kazienko, and Jan Koco´n. Personalized large language models. arXiv preprint arXiv:2402.09269, 2024

work page arXiv 2024
[4]

Tidybot: Personalized robot assistance with large language models

Jimmy Wu, Rika Antonova, Adam Kan, Marion Lepert, Andy Zeng, Shuran Song, Jeannette Bohg, Szymon Rusinkiewicz, and Thomas Funkhouser. Tidybot: Personalized robot assistance with large language models. Autonomous Robots, 47(8):1087–1102, 2023

work page 2023
[5]

Llm-rec: Personalized recommendation via prompting large language models

Hanjia Lyu, Song Jiang, Hanqing Zeng, Yinglong Xia, and Jiebo Luo. Llm-rec: Personalized recommendation via prompting large language models. arXiv preprint arXiv:2307.15780, 2023

work page arXiv 2023
[6]

Democratizing large language models via personalized parameter-efficient fine-tuning

Zhaoxuan Tan, Qingkai Zeng, Yijun Tian, Zheyuan Liu, Bing Yin, and Meng Jiang. Democratizing large language models via personalized parameter-efficient fine-tuning. arXiv preprint arXiv:2402.04401, 2024

work page arXiv 2024
[7]

Learning from massive noisy labeled data for image classification

Tong Xiao, Tian Xia, Yi Yang, Chang Huang, and Xiaogang Wang. Learning from massive noisy labeled data for image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2691–2699, 2015

work page 2015
[8]

Learning multiple layers of features from tiny images

Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009

work page 2009
[9]

Learning with noisy labels revisited: A study using real-world human annotations

Jiaheng Wei, Zhaowei Zhu, Hao Cheng, Tongliang Liu, Gang Niu, and Yang Liu. Learning with noisy labels revisited: A study using real-world human annotations. arXiv preprint arXiv:2110.12088, 2021

work page arXiv 2021
[10]

Deep learning face attributes in the wild

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), December 2015

work page 2015
[11]

Overlooked factors in concept-based explanations: Dataset choice, concept learnability, and human capability

Vikram V Ramaswamy, Sunnie SY Kim, Ruth Fong, and Olga Russakovsky. Overlooked factors in concept-based explanations: Dataset choice, concept learnability, and human capability. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10932–10941, 2023

work page 2023
[12]

Learning with noisy labels

Nagarajan Natarajan, Inderjit S Dhillon, Pradeep K Ravikumar, and Ambuj Tewari. Learning with noisy labels. Advances in neural information processing systems, 26, 2013

work page 2013
[13]

Classification with noisy labels by importance reweighting

Tongliang Liu and Dacheng Tao. Classification with noisy labels by importance reweighting. IEEE Transactions on pattern analysis and machine intelligence, 38(3):447–461, 2015

work page 2015
[14]

WebVision Database: Visual Learning and Understanding from Web Data

Wen Li, Limin Wang, Wei Li, Eirikur Agustsson, and Luc Van Gool. Webvision database: Visual learning and understanding from web data. arXiv preprint arXiv:1708.02862, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[15]

Learning with noisy labels revisited: A study using real-world human annotations

Jiaheng Wei, Zhaowei Zhu, Hao Cheng, Tongliang Liu, Gang Niu, and Yang Liu. Learning with noisy labels revisited: A study using real-world human annotations. In International Conference on Learning Representations, 2022

work page 2022
[16]

Revolt: Collaborative crowdsourcing for labeling machine learning datasets

Joseph Chee Chang, Saleema Amershi, and Ece Kamar. Revolt: Collaborative crowdsourcing for labeling machine learning datasets. In Proceedings of the 2017 CHI conference on human factors in computing systems , pages 2334–2346, 2017

work page 2017
[17]

Structured labeling for facilitating concept evolution in machine learning

Todd Kulesza, Saleema Amershi, Rich Caruana, Danyel Fisher, and Denis Charles. Structured labeling for facilitating concept evolution in machine learning. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 3075–3084, 2014

work page 2014
[18]

Does the whole exceed its parts? the effect of ai explanations on complementary team performance

Gagan Bansal, Tongshuang Wu, Joyce Zhou, Raymond Fok, Besmira Nushi, Ece Kamar, Marco Tulio Ribeiro, and Daniel Weld. Does the whole exceed its parts? the effect of ai explanations on complementary team performance. In Proceedings of the 2021 CHI conference on human factors in computing systems, pages 1–16, 2021

work page 2021
[19]

Is the most accurate ai the best teammate? optimizing ai for teamwork

Gagan Bansal, Besmira Nushi, Ece Kamar, Eric Horvitz, and Daniel S Weld. Is the most accurate ai the best teammate? optimizing ai for teamwork. In Proceedings of the AAAI Conference on Artificial Intelligence , volume 35, pages 11405–11414, 2021

work page 2021
[20]

Iterative human-in-the-loop discovery of unknown unknowns in image datasets

Lei Han, Xiao Dong, and Gianluca Demartini. Iterative human-in-the-loop discovery of unknown unknowns in image datasets. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, volume 9, pages 72–83, 2021. 10 PRIME AI paper

work page 2021
[21]

Get another label? improving data quality and data mining using multiple, noisy labelers

Victor S Sheng, Foster Provost, and Panagiotis G Ipeirotis. Get another label? improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 614–622, 2008

work page 2008
[22]

https://docta.ai/

Docta. https://docta.ai/

work page
[23]

https://cleanlab.ai/

Cleanlab. https://cleanlab.ai/

work page
[24]

https://snorkel.ai/

Snorkel. https://snorkel.ai/

work page
[25]

Unmasking and improving data credibility: A study with datasets for training harmless language models

Zhaowei Zhu, Jialu Wang, Hao Cheng, and Yang Liu. Unmasking and improving data credibility: A study with datasets for training harmless language models. arXiv preprint arXiv:2311.11202, 2023

work page arXiv 2023
[26]

Clusterability as an alternative to anchor points when learning with noisy labels

Zhaowei Zhu, Yiwen Song, and Yang Liu. Clusterability as an alternative to anchor points when learning with noisy labels. In International Conference on Machine Learning, pages 12912–12923. PMLR, 2021

work page 2021
[27]

Detecting corrupted labels without training a model to predict

Zhaowei Zhu, Zihao Dong, and Yang Liu. Detecting corrupted labels without training a model to predict. In International conference on machine learning, pages 27412–27427. PMLR, 2022

work page 2022
[28]

Updates in human-ai teams: Understanding and addressing the performance/compatibility tradeoff

Gagan Bansal, Besmira Nushi, Ece Kamar, Daniel S Weld, Walter S Lasecki, and Eric Horvitz. Updates in human-ai teams: Understanding and addressing the performance/compatibility tradeoff. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 2429–2437, 2019

work page 2019
[29]

Do humans and machines have the same eyes? human- machine perceptual differences on image classification

Minghao Liu, Jiaheng Wei, Yang Liu, and James Davis. Do humans and machines have the same eyes? human- machine perceptual differences on image classification. arXiv preprint arXiv:2304.08733, 2023

work page arXiv 2023
[30]

The inaturalist species classification and detection dataset

Grant Van Horn, Oisin Mac Aodha, Yang Song, Yin Cui, Chen Sun, Alex Shepard, Hartwig Adam, Pietro Perona, and Serge Belongie. The inaturalist species classification and detection dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8769–8778, 2018

work page 2018
[31]

SELFIE: Refurbishing unclean samples for robust deep learning

Hwanjun Song, Minseok Kim, and Jae-Gil Lee. SELFIE: Refurbishing unclean samples for robust deep learning. In ICML, 2019

work page 2019
[32]

Food-101 – mining discriminative components with random forests

Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. Food-101 – mining discriminative components with random forests. In European Conference on Computer Vision, 2014

work page 2014
[33]

Making deep neural networks robust to label noise: A loss correction approach

Giorgio Patrini, Alessandro Rozza, Aditya Krishna Menon, Richard Nock, and Lizhen Qu. Making deep neural networks robust to label noise: A loss correction approach. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1944–1952, 2017

work page 1944
[34]

Confident learning: Estimating uncertainty in dataset labels.Journal of Artificial Intelligence Research, 70:1373–1411, 2021

Curtis Northcutt, Lu Jiang, and Isaac Chuang. Confident learning: Estimating uncertainty in dataset labels.Journal of Artificial Intelligence Research, 70:1373–1411, 2021

work page 2021
[35]

Estimating training data influence by tracing gradient descent

Garima Pruthi, Frederick Liu, Satyen Kale, and Mukund Sundararajan. Estimating training data influence by tracing gradient descent. Advances in Neural Information Processing Systems, 33:19920–19930, 2020

work page 2020
[36]

Deep self-learning from noisy labels

Jiangfan Han, Ping Luo, and Xiaogang Wang. Deep self-learning from noisy labels. In Proceedings of the IEEE/CVF international conference on computer vision, pages 5138–5147, 2019

work page 2019
[37]

Early-learning regularization prevents memorization of noisy labels

Sheng Liu, Jonathan Niles-Weed, Narges Razavian, and Carlos Fernandez-Granda. Early-learning regularization prevents memorization of noisy labels. Advances in neural information processing systems, 33:20331–20342, 2020

work page 2020
[38]

Robust early- learning: Hindering the memorization of noisy labels

Xiaobo Xia, Tongliang Liu, Bo Han, Chen Gong, Nannan Wang, Zongyuan Ge, and Yi Chang. Robust early- learning: Hindering the memorization of noisy labels. In International conference on learning representations, 2020

work page 2020
[39]

Identifying mislabeled training data.Journal of artificial intelligence research, 11:131–167, 1999

Carla E Brodley and Mark A Friedl. Identifying mislabeled training data.Journal of artificial intelligence research, 11:131–167, 1999

work page 1999
[40]

Improving identification of difficult small classes by balancing class distribution

Jorma Laurikkala. Improving identification of difficult small classes by balancing class distribution. In Artificial Intelligence in Medicine: 8th Conference on Artificial Intelligence in Medicine in Europe, AIME 2001 Cascais, Portugal, July 1–4, 2001, Proceedings 8, pages 63–66. Springer, 2001

work page 2001
[41]

Learning with instance-dependent label noise: A sample sieve approach

Hao Cheng, Zhaowei Zhu, Xingyu Li, Yifei Gong, Xing Sun, and Yang Liu. Learning with instance-dependent label noise: A sample sieve approach. arXiv preprint arXiv:2010.02347, 2020

work page arXiv 2010
[42]

Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning

Nicolas Papernot and Patrick McDaniel. Deep k-nearest neighbors: Towards confident, interpretable and robust deep learning. arXiv preprint arXiv:1803.04765, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[43]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 11 PRIME AI paper

work page 2016
[44]

Learning to reweight examples for robust deep learning

Mengye Ren, Wenyuan Zeng, Bin Yang, and Raquel Urtasun. Learning to reweight examples for robust deep learning. In International conference on machine learning, pages 4334–4343. PMLR, 2018

work page 2018
[45]

Symmetric cross entropy for robust learning with noisy labels

Yisen Wang, Xingjun Ma, Zaiyi Chen, Yuan Luo, Jinfeng Yi, and James Bailey. Symmetric cross entropy for robust learning with noisy labels. In Proceedings of the IEEE/CVF international conference on computer vision, pages 322–330, 2019

work page 2019
[46]

Generalized cross entropy loss for training deep neural networks with noisy labels

Zhilu Zhang and Mert Sabuncu. Generalized cross entropy loss for training deep neural networks with noisy labels. Advances in neural information processing systems, 31, 2018

work page 2018
[47]

Peer loss functions: Learning from noisy labels without knowing noise rates

Yang Liu and Hongyi Guo. Peer loss functions: Learning from noisy labels without knowing noise rates. In International conference on machine learning, pages 6226–6236. PMLR, 2020

work page 2020
[48]

When optimizing f-divergence is robust with label noise

Jiaheng Wei and Yang Liu. When optimizing f-divergence is robust with label noise. arXiv preprint arXiv:2011.03687, 2020

work page arXiv 2011
[49]

mixup: Beyond Empirical Risk Minimization

Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[50]

When does label smoothing help? Advances in neural information processing systems, 32, 2019

Rafael Müller, Simon Kornblith, and Geoffrey E Hinton. When does label smoothing help? Advances in neural information processing systems, 32, 2019

work page 2019
[51]

Does label smoothing mitigate label noise? In International Conference on Machine Learning, pages 6448–6458

Michal Lukasik, Srinadh Bhojanapalli, Aditya Menon, and Sanjiv Kumar. Does label smoothing mitigate label noise? In International Conference on Machine Learning, pages 6448–6458. PMLR, 2020

work page 2020
[52]

To smooth or not? when label smoothing meets noisy labels

Jiaheng Wei, Hangyu Liu, Tongliang Liu, Gang Niu, Masashi Sugiyama, and Yang Liu. To smooth or not? when label smoothing meets noisy labels. In International Conference on Machine Learning, pages 23589–23614. PMLR, 2022

work page 2022
[53]

Co-teaching: Robust training of deep neural networks with extremely noisy labels.Advances in neural information processing systems, 31, 2018

Bo Han, Quanming Yao, Xingrui Yu, Gang Niu, Miao Xu, Weihua Hu, Ivor Tsang, and Masashi Sugiyama. Co-teaching: Robust training of deep neural networks with extremely noisy labels.Advances in neural information processing systems, 31, 2018

work page 2018
[54]

Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels

Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, and Li Fei-Fei. Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In International conference on machine learning , pages 2304–2313. PMLR, 2018

work page 2018
[55]

Dividemix: Learning with noisy labels as semi-supervised learning

Junnan Li, Richard Socher, and Steven CH Hoi. Dividemix: Learning with noisy labels as semi-supervised learning. arXiv preprint arXiv:2002.07394, 2020

work page arXiv 2002
[56]

Combating noisy labels by agreement: A joint training method with co-regularization

Hongxin Wei, Lei Feng, Xiangyu Chen, and Bo An. Combating noisy labels by agreement: A joint training method with co-regularization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13726–13735, 2020

work page 2020
[57]

Mitigating memorization of noisy labels by clipping the model prediction

Hongxin Wei, Huiping Zhuang, Renchunzi Xie, Lei Feng, Gang Niu, Bo An, and Yixuan Li. Mitigating memorization of noisy labels by clipping the model prediction. In International Conference on Machine Learning, pages 36868–36886. PMLR, 2023

work page 2023
[58]

Improved cross entropy loss for noisy labels in vision leaf disease classification

Yipeng Chen, Ke Xu, Peng Zhou, Xiaojuan Ban, and Di He. Improved cross entropy loss for noisy labels in vision leaf disease classification. IET Image Processing, 16(6):1511–1519, 2022

work page 2022
[59]

Class imbalances versus small disjuncts

Taeho Jo and Nathalie Japkowicz. Class imbalances versus small disjuncts. ACM Sigkdd Explorations Newsletter, 6(1):40–49, 2004

work page 2004
[60]

Smote: synthetic minority over-sampling technique

Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321–357, 2002

work page 2002
[61]

Borderline-smote: a new over-sampling method in imbalanced data sets learning

Hui Han, Wen-Yuan Wang, and Bing-Huan Mao. Borderline-smote: a new over-sampling method in imbalanced data sets learning. In International conference on intelligent computing, pages 878–887. Springer, 2005

work page 2005
[62]

Safe-level-smote: Safe-level- synthetic minority over-sampling technique for handling the class imbalanced problem

Chumphol Bunkhumpornpat, Krung Sinapiromsaran, and Chidchanok Lursinsap. Safe-level-smote: Safe-level- synthetic minority over-sampling technique for handling the class imbalanced problem. InAdvances in Knowledge Discovery and Data Mining: 13th Pacific-Asia Conference, PAKDD 2009 Bangkok, Thailand, April 27-30, 2009 Proceedings 13, pages 475–482. Springer, 2009

work page 2009
[63]

Adasyn: Adaptive synthetic sampling approach for imbalanced learning

Haibo He, Yang Bai, Edwardo A Garcia, and Shutao Li. Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), pages 1322–1328. Ieee, 2008

work page 2008
[64]

knn approach to unbalanced data distributions: a case study involving information extraction

Inderjeet Mani and I Zhang. knn approach to unbalanced data distributions: a case study involving information extraction. In Proceedings of workshop on learning from imbalanced datasets, volume 126, pages 1–7. ICML, 2003. 12 PRIME AI paper

work page 2003
[65]

Addressing the curse of imbalanced training sets: one-sided selection

Miroslav Kubat, Stan Matwin, et al. Addressing the curse of imbalanced training sets: one-sided selection. In Icml, volume 97, page 179. Citeseer, 1997

work page 1997
[66]

Two modifications of cnn

I TOMEK. Two modifications of cnn. IEEE Trans. Systems, Man and Cybernetics, 6:769–772, 1976

work page 1976
[67]

The foundations of cost-sensitive learning

Charles Elkan. The foundations of cost-sensitive learning. In International joint conference on artificial intelligence, volume 17, pages 973–978. Lawrence Erlbaum Associates Ltd, 2001

work page 2001
[68]

Cost-sensitive learning with neural networks

Matjaz Kukar, Igor Kononenko, et al. Cost-sensitive learning with neural networks. In ECAI, volume 15, pages 88–94. Citeseer, 1998

work page 1998
[69]

Training cost-sensitive neural networks with methods addressing the class imbalance problem

Zhi-Hua Zhou and Xu-Ying Liu. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on knowledge and data engineering, 18(1):63–77, 2005

work page 2005
[70]

Neural network classification and prior class probabilities

Steve Lawrence, Ian Burns, Andrew Back, Ah Chung Tsoi, and C Lee Giles. Neural network classification and prior class probabilities. In Neural networks: tricks of the trade, pages 299–313. Springer, 2002

work page 2002
[71]

Neural network classifiers estimate bayesian a posteriori probabilities

Michael D Richard and Richard P Lippmann. Neural network classifiers estimate bayesian a posteriori probabilities. Neural computation, 3(4):461–483, 1991

work page 1991
[72]

Distri- butionally robust post-hoc classifiers under prior shifts

Jiaheng Wei, Harikrishna Narasimhan, Ehsan Amid, Wen-Sheng Chu, Yang Liu, and Abhishek Kumar. Distri- butionally robust post-hoc classifiers under prior shifts. In The Eleventh International Conference on Learning Representations, 2023

work page 2023
[73]

Focal loss for dense object detection

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017

work page 2017
[74]

Learning imbalanced datasets with label-distribution-aware margin loss

Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, and Tengyu Ma. Learning imbalanced datasets with label-distribution-aware margin loss. Advances in neural information processing systems, 32, 2019

work page 2019
[75]

Balanced meta-softmax for long-tailed visual recognition

Jiawei Ren, Cunjun Yu, Xiao Ma, Haiyu Zhao, Shuai Yi, et al. Balanced meta-softmax for long-tailed visual recognition. Advances in neural information processing systems, 33:4175–4186, 2020

work page 2020
[76]

sweater",

Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, and Sanjiv Kumar. Long-tail learning via logit adjustment. arXiv preprint arXiv:2007.07314, 2020. 13 PRIME AI paper Appendix The appendix is organized as follows: • Appendix A includes additional detailed algorithms in the Automatic-Dataset-Construction pipeline. • Ap...

work page arXiv 2007
[77]

For backward and forward correction, we train the model using cross-entropy (CE) loss for the first 10 epochs

The hyper-parameters for each baseline method are as follows. For backward and forward correction, we train the model using cross-entropy (CE) loss for the first 10 epochs. We estimate the transition matrix every epoch from the 10th to the 20th epoch. For the positive and negative label smoothing, the smoothed labels are used at the 10th epoch. The smooth...

work page 2097

[1] [1]

Leveraging large language models for decision support in personalized oncology

Manuela Benary, Xing David Wang, Max Schmidt, Dominik Soll, Georg Hilfenhaus, Mani Nassir, Christian Sigler, Maren Knödler, Ulrich Keller, Dieter Beule, et al. Leveraging large language models for decision support in personalized oncology. JAMA Network Open, 6(11):e2343689–e2343689, 2023

work page 2023

[2] [2]

Autogen: A personalized large language model for academic enhancement—ethics and proof of principle

Sebastian Porsdam Mann, Brian D Earp, Nikolaj Møller, Suren Vynn, and Julian Savulescu. Autogen: A personalized large language model for academic enhancement—ethics and proof of principle. The American Journal of Bioethics, 23(10):28–41, 2023

work page 2023

[3] [3]

Personalized large language models

Stanisław Wo´ zniak, Bartłomiej Koptyra, Arkadiusz Janz, Przemysław Kazienko, and Jan Koco´n. Personalized large language models. arXiv preprint arXiv:2402.09269, 2024

work page arXiv 2024

[4] [4]

Tidybot: Personalized robot assistance with large language models

Jimmy Wu, Rika Antonova, Adam Kan, Marion Lepert, Andy Zeng, Shuran Song, Jeannette Bohg, Szymon Rusinkiewicz, and Thomas Funkhouser. Tidybot: Personalized robot assistance with large language models. Autonomous Robots, 47(8):1087–1102, 2023

work page 2023

[5] [5]

Llm-rec: Personalized recommendation via prompting large language models

Hanjia Lyu, Song Jiang, Hanqing Zeng, Yinglong Xia, and Jiebo Luo. Llm-rec: Personalized recommendation via prompting large language models. arXiv preprint arXiv:2307.15780, 2023

work page arXiv 2023

[6] [6]

Democratizing large language models via personalized parameter-efficient fine-tuning

Zhaoxuan Tan, Qingkai Zeng, Yijun Tian, Zheyuan Liu, Bing Yin, and Meng Jiang. Democratizing large language models via personalized parameter-efficient fine-tuning. arXiv preprint arXiv:2402.04401, 2024

work page arXiv 2024

[7] [7]

Learning from massive noisy labeled data for image classification

Tong Xiao, Tian Xia, Yi Yang, Chang Huang, and Xiaogang Wang. Learning from massive noisy labeled data for image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2691–2699, 2015

work page 2015

[8] [8]

Learning multiple layers of features from tiny images

Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009

work page 2009

[9] [9]

Learning with noisy labels revisited: A study using real-world human annotations

Jiaheng Wei, Zhaowei Zhu, Hao Cheng, Tongliang Liu, Gang Niu, and Yang Liu. Learning with noisy labels revisited: A study using real-world human annotations. arXiv preprint arXiv:2110.12088, 2021

work page arXiv 2021

[10] [10]

Deep learning face attributes in the wild

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), December 2015

work page 2015

[11] [11]

Overlooked factors in concept-based explanations: Dataset choice, concept learnability, and human capability

Vikram V Ramaswamy, Sunnie SY Kim, Ruth Fong, and Olga Russakovsky. Overlooked factors in concept-based explanations: Dataset choice, concept learnability, and human capability. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10932–10941, 2023

work page 2023

[12] [12]

Learning with noisy labels

Nagarajan Natarajan, Inderjit S Dhillon, Pradeep K Ravikumar, and Ambuj Tewari. Learning with noisy labels. Advances in neural information processing systems, 26, 2013

work page 2013

[13] [13]

Classification with noisy labels by importance reweighting

Tongliang Liu and Dacheng Tao. Classification with noisy labels by importance reweighting. IEEE Transactions on pattern analysis and machine intelligence, 38(3):447–461, 2015

work page 2015

[14] [14]

WebVision Database: Visual Learning and Understanding from Web Data

Wen Li, Limin Wang, Wei Li, Eirikur Agustsson, and Luc Van Gool. Webvision database: Visual learning and understanding from web data. arXiv preprint arXiv:1708.02862, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[15] [15]

Learning with noisy labels revisited: A study using real-world human annotations

Jiaheng Wei, Zhaowei Zhu, Hao Cheng, Tongliang Liu, Gang Niu, and Yang Liu. Learning with noisy labels revisited: A study using real-world human annotations. In International Conference on Learning Representations, 2022

work page 2022

[16] [16]

Revolt: Collaborative crowdsourcing for labeling machine learning datasets

Joseph Chee Chang, Saleema Amershi, and Ece Kamar. Revolt: Collaborative crowdsourcing for labeling machine learning datasets. In Proceedings of the 2017 CHI conference on human factors in computing systems , pages 2334–2346, 2017

work page 2017

[17] [17]

Structured labeling for facilitating concept evolution in machine learning

Todd Kulesza, Saleema Amershi, Rich Caruana, Danyel Fisher, and Denis Charles. Structured labeling for facilitating concept evolution in machine learning. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 3075–3084, 2014

work page 2014

[18] [18]

Does the whole exceed its parts? the effect of ai explanations on complementary team performance

Gagan Bansal, Tongshuang Wu, Joyce Zhou, Raymond Fok, Besmira Nushi, Ece Kamar, Marco Tulio Ribeiro, and Daniel Weld. Does the whole exceed its parts? the effect of ai explanations on complementary team performance. In Proceedings of the 2021 CHI conference on human factors in computing systems, pages 1–16, 2021

work page 2021

[19] [19]

Is the most accurate ai the best teammate? optimizing ai for teamwork

Gagan Bansal, Besmira Nushi, Ece Kamar, Eric Horvitz, and Daniel S Weld. Is the most accurate ai the best teammate? optimizing ai for teamwork. In Proceedings of the AAAI Conference on Artificial Intelligence , volume 35, pages 11405–11414, 2021

work page 2021

[20] [20]

Iterative human-in-the-loop discovery of unknown unknowns in image datasets

Lei Han, Xiao Dong, and Gianluca Demartini. Iterative human-in-the-loop discovery of unknown unknowns in image datasets. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, volume 9, pages 72–83, 2021. 10 PRIME AI paper

work page 2021

[21] [21]

Get another label? improving data quality and data mining using multiple, noisy labelers

Victor S Sheng, Foster Provost, and Panagiotis G Ipeirotis. Get another label? improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 614–622, 2008

work page 2008

[22] [22]

https://docta.ai/

Docta. https://docta.ai/

work page

[23] [23]

https://cleanlab.ai/

Cleanlab. https://cleanlab.ai/

work page

[24] [24]

https://snorkel.ai/

Snorkel. https://snorkel.ai/

work page

[25] [25]

Unmasking and improving data credibility: A study with datasets for training harmless language models

Zhaowei Zhu, Jialu Wang, Hao Cheng, and Yang Liu. Unmasking and improving data credibility: A study with datasets for training harmless language models. arXiv preprint arXiv:2311.11202, 2023

work page arXiv 2023

[26] [26]

Clusterability as an alternative to anchor points when learning with noisy labels

Zhaowei Zhu, Yiwen Song, and Yang Liu. Clusterability as an alternative to anchor points when learning with noisy labels. In International Conference on Machine Learning, pages 12912–12923. PMLR, 2021

work page 2021

[27] [27]

Detecting corrupted labels without training a model to predict

Zhaowei Zhu, Zihao Dong, and Yang Liu. Detecting corrupted labels without training a model to predict. In International conference on machine learning, pages 27412–27427. PMLR, 2022

work page 2022

[28] [28]

Updates in human-ai teams: Understanding and addressing the performance/compatibility tradeoff

Gagan Bansal, Besmira Nushi, Ece Kamar, Daniel S Weld, Walter S Lasecki, and Eric Horvitz. Updates in human-ai teams: Understanding and addressing the performance/compatibility tradeoff. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 2429–2437, 2019

work page 2019

[29] [29]

Do humans and machines have the same eyes? human- machine perceptual differences on image classification

Minghao Liu, Jiaheng Wei, Yang Liu, and James Davis. Do humans and machines have the same eyes? human- machine perceptual differences on image classification. arXiv preprint arXiv:2304.08733, 2023

work page arXiv 2023

[30] [30]

The inaturalist species classification and detection dataset

Grant Van Horn, Oisin Mac Aodha, Yang Song, Yin Cui, Chen Sun, Alex Shepard, Hartwig Adam, Pietro Perona, and Serge Belongie. The inaturalist species classification and detection dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8769–8778, 2018

work page 2018

[31] [31]

SELFIE: Refurbishing unclean samples for robust deep learning

Hwanjun Song, Minseok Kim, and Jae-Gil Lee. SELFIE: Refurbishing unclean samples for robust deep learning. In ICML, 2019

work page 2019

[32] [32]

Food-101 – mining discriminative components with random forests

Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. Food-101 – mining discriminative components with random forests. In European Conference on Computer Vision, 2014

work page 2014

[33] [33]

Making deep neural networks robust to label noise: A loss correction approach

Giorgio Patrini, Alessandro Rozza, Aditya Krishna Menon, Richard Nock, and Lizhen Qu. Making deep neural networks robust to label noise: A loss correction approach. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1944–1952, 2017

work page 1944

[34] [34]

Confident learning: Estimating uncertainty in dataset labels.Journal of Artificial Intelligence Research, 70:1373–1411, 2021

Curtis Northcutt, Lu Jiang, and Isaac Chuang. Confident learning: Estimating uncertainty in dataset labels.Journal of Artificial Intelligence Research, 70:1373–1411, 2021

work page 2021

[35] [35]

Estimating training data influence by tracing gradient descent

Garima Pruthi, Frederick Liu, Satyen Kale, and Mukund Sundararajan. Estimating training data influence by tracing gradient descent. Advances in Neural Information Processing Systems, 33:19920–19930, 2020

work page 2020

[36] [36]

Deep self-learning from noisy labels

Jiangfan Han, Ping Luo, and Xiaogang Wang. Deep self-learning from noisy labels. In Proceedings of the IEEE/CVF international conference on computer vision, pages 5138–5147, 2019

work page 2019

[37] [37]

Early-learning regularization prevents memorization of noisy labels

Sheng Liu, Jonathan Niles-Weed, Narges Razavian, and Carlos Fernandez-Granda. Early-learning regularization prevents memorization of noisy labels. Advances in neural information processing systems, 33:20331–20342, 2020

work page 2020

[38] [38]

Robust early- learning: Hindering the memorization of noisy labels

Xiaobo Xia, Tongliang Liu, Bo Han, Chen Gong, Nannan Wang, Zongyuan Ge, and Yi Chang. Robust early- learning: Hindering the memorization of noisy labels. In International conference on learning representations, 2020

work page 2020

[39] [39]

Identifying mislabeled training data.Journal of artificial intelligence research, 11:131–167, 1999

Carla E Brodley and Mark A Friedl. Identifying mislabeled training data.Journal of artificial intelligence research, 11:131–167, 1999

work page 1999

[40] [40]

Improving identification of difficult small classes by balancing class distribution

Jorma Laurikkala. Improving identification of difficult small classes by balancing class distribution. In Artificial Intelligence in Medicine: 8th Conference on Artificial Intelligence in Medicine in Europe, AIME 2001 Cascais, Portugal, July 1–4, 2001, Proceedings 8, pages 63–66. Springer, 2001

work page 2001

[41] [41]

Learning with instance-dependent label noise: A sample sieve approach

Hao Cheng, Zhaowei Zhu, Xingyu Li, Yifei Gong, Xing Sun, and Yang Liu. Learning with instance-dependent label noise: A sample sieve approach. arXiv preprint arXiv:2010.02347, 2020

work page arXiv 2010

[42] [42]

Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning

Nicolas Papernot and Patrick McDaniel. Deep k-nearest neighbors: Towards confident, interpretable and robust deep learning. arXiv preprint arXiv:1803.04765, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[43] [43]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 11 PRIME AI paper

work page 2016

[44] [44]

Learning to reweight examples for robust deep learning

Mengye Ren, Wenyuan Zeng, Bin Yang, and Raquel Urtasun. Learning to reweight examples for robust deep learning. In International conference on machine learning, pages 4334–4343. PMLR, 2018

work page 2018

[45] [45]

Symmetric cross entropy for robust learning with noisy labels

Yisen Wang, Xingjun Ma, Zaiyi Chen, Yuan Luo, Jinfeng Yi, and James Bailey. Symmetric cross entropy for robust learning with noisy labels. In Proceedings of the IEEE/CVF international conference on computer vision, pages 322–330, 2019

work page 2019

[46] [46]

Generalized cross entropy loss for training deep neural networks with noisy labels

Zhilu Zhang and Mert Sabuncu. Generalized cross entropy loss for training deep neural networks with noisy labels. Advances in neural information processing systems, 31, 2018

work page 2018

[47] [47]

Peer loss functions: Learning from noisy labels without knowing noise rates

Yang Liu and Hongyi Guo. Peer loss functions: Learning from noisy labels without knowing noise rates. In International conference on machine learning, pages 6226–6236. PMLR, 2020

work page 2020

[48] [48]

When optimizing f-divergence is robust with label noise

Jiaheng Wei and Yang Liu. When optimizing f-divergence is robust with label noise. arXiv preprint arXiv:2011.03687, 2020

work page arXiv 2011

[49] [49]

mixup: Beyond Empirical Risk Minimization

Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[50] [50]

When does label smoothing help? Advances in neural information processing systems, 32, 2019

Rafael Müller, Simon Kornblith, and Geoffrey E Hinton. When does label smoothing help? Advances in neural information processing systems, 32, 2019

work page 2019

[51] [51]

Does label smoothing mitigate label noise? In International Conference on Machine Learning, pages 6448–6458

Michal Lukasik, Srinadh Bhojanapalli, Aditya Menon, and Sanjiv Kumar. Does label smoothing mitigate label noise? In International Conference on Machine Learning, pages 6448–6458. PMLR, 2020

work page 2020

[52] [52]

To smooth or not? when label smoothing meets noisy labels

Jiaheng Wei, Hangyu Liu, Tongliang Liu, Gang Niu, Masashi Sugiyama, and Yang Liu. To smooth or not? when label smoothing meets noisy labels. In International Conference on Machine Learning, pages 23589–23614. PMLR, 2022

work page 2022

[53] [53]

Co-teaching: Robust training of deep neural networks with extremely noisy labels.Advances in neural information processing systems, 31, 2018

Bo Han, Quanming Yao, Xingrui Yu, Gang Niu, Miao Xu, Weihua Hu, Ivor Tsang, and Masashi Sugiyama. Co-teaching: Robust training of deep neural networks with extremely noisy labels.Advances in neural information processing systems, 31, 2018

work page 2018

[54] [54]

Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels

Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, and Li Fei-Fei. Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In International conference on machine learning , pages 2304–2313. PMLR, 2018

work page 2018

[55] [55]

Dividemix: Learning with noisy labels as semi-supervised learning

Junnan Li, Richard Socher, and Steven CH Hoi. Dividemix: Learning with noisy labels as semi-supervised learning. arXiv preprint arXiv:2002.07394, 2020

work page arXiv 2002

[56] [56]

Combating noisy labels by agreement: A joint training method with co-regularization

Hongxin Wei, Lei Feng, Xiangyu Chen, and Bo An. Combating noisy labels by agreement: A joint training method with co-regularization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13726–13735, 2020

work page 2020

[57] [57]

Mitigating memorization of noisy labels by clipping the model prediction

Hongxin Wei, Huiping Zhuang, Renchunzi Xie, Lei Feng, Gang Niu, Bo An, and Yixuan Li. Mitigating memorization of noisy labels by clipping the model prediction. In International Conference on Machine Learning, pages 36868–36886. PMLR, 2023

work page 2023

[58] [58]

Improved cross entropy loss for noisy labels in vision leaf disease classification

Yipeng Chen, Ke Xu, Peng Zhou, Xiaojuan Ban, and Di He. Improved cross entropy loss for noisy labels in vision leaf disease classification. IET Image Processing, 16(6):1511–1519, 2022

work page 2022

[59] [59]

Class imbalances versus small disjuncts

Taeho Jo and Nathalie Japkowicz. Class imbalances versus small disjuncts. ACM Sigkdd Explorations Newsletter, 6(1):40–49, 2004

work page 2004

[60] [60]

Smote: synthetic minority over-sampling technique

Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321–357, 2002

work page 2002

[61] [61]

Borderline-smote: a new over-sampling method in imbalanced data sets learning

Hui Han, Wen-Yuan Wang, and Bing-Huan Mao. Borderline-smote: a new over-sampling method in imbalanced data sets learning. In International conference on intelligent computing, pages 878–887. Springer, 2005

work page 2005

[62] [62]

Safe-level-smote: Safe-level- synthetic minority over-sampling technique for handling the class imbalanced problem

Chumphol Bunkhumpornpat, Krung Sinapiromsaran, and Chidchanok Lursinsap. Safe-level-smote: Safe-level- synthetic minority over-sampling technique for handling the class imbalanced problem. InAdvances in Knowledge Discovery and Data Mining: 13th Pacific-Asia Conference, PAKDD 2009 Bangkok, Thailand, April 27-30, 2009 Proceedings 13, pages 475–482. Springer, 2009

work page 2009

[63] [63]

Adasyn: Adaptive synthetic sampling approach for imbalanced learning

Haibo He, Yang Bai, Edwardo A Garcia, and Shutao Li. Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), pages 1322–1328. Ieee, 2008

work page 2008

[64] [64]

knn approach to unbalanced data distributions: a case study involving information extraction

Inderjeet Mani and I Zhang. knn approach to unbalanced data distributions: a case study involving information extraction. In Proceedings of workshop on learning from imbalanced datasets, volume 126, pages 1–7. ICML, 2003. 12 PRIME AI paper

work page 2003

[65] [65]

Addressing the curse of imbalanced training sets: one-sided selection

Miroslav Kubat, Stan Matwin, et al. Addressing the curse of imbalanced training sets: one-sided selection. In Icml, volume 97, page 179. Citeseer, 1997

work page 1997

[66] [66]

Two modifications of cnn

I TOMEK. Two modifications of cnn. IEEE Trans. Systems, Man and Cybernetics, 6:769–772, 1976

work page 1976

[67] [67]

The foundations of cost-sensitive learning

Charles Elkan. The foundations of cost-sensitive learning. In International joint conference on artificial intelligence, volume 17, pages 973–978. Lawrence Erlbaum Associates Ltd, 2001

work page 2001

[68] [68]

Cost-sensitive learning with neural networks

Matjaz Kukar, Igor Kononenko, et al. Cost-sensitive learning with neural networks. In ECAI, volume 15, pages 88–94. Citeseer, 1998

work page 1998

[69] [69]

Training cost-sensitive neural networks with methods addressing the class imbalance problem

Zhi-Hua Zhou and Xu-Ying Liu. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on knowledge and data engineering, 18(1):63–77, 2005

work page 2005

[70] [70]

Neural network classification and prior class probabilities

Steve Lawrence, Ian Burns, Andrew Back, Ah Chung Tsoi, and C Lee Giles. Neural network classification and prior class probabilities. In Neural networks: tricks of the trade, pages 299–313. Springer, 2002

work page 2002

[71] [71]

Neural network classifiers estimate bayesian a posteriori probabilities

Michael D Richard and Richard P Lippmann. Neural network classifiers estimate bayesian a posteriori probabilities. Neural computation, 3(4):461–483, 1991

work page 1991

[72] [72]

Distri- butionally robust post-hoc classifiers under prior shifts

Jiaheng Wei, Harikrishna Narasimhan, Ehsan Amid, Wen-Sheng Chu, Yang Liu, and Abhishek Kumar. Distri- butionally robust post-hoc classifiers under prior shifts. In The Eleventh International Conference on Learning Representations, 2023

work page 2023

[73] [73]

Focal loss for dense object detection

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017

work page 2017

[74] [74]

Learning imbalanced datasets with label-distribution-aware margin loss

Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, and Tengyu Ma. Learning imbalanced datasets with label-distribution-aware margin loss. Advances in neural information processing systems, 32, 2019

work page 2019

[75] [75]

Balanced meta-softmax for long-tailed visual recognition

Jiawei Ren, Cunjun Yu, Xiao Ma, Haiyu Zhao, Shuai Yi, et al. Balanced meta-softmax for long-tailed visual recognition. Advances in neural information processing systems, 33:4175–4186, 2020

work page 2020

[76] [76]

sweater",

Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, and Sanjiv Kumar. Long-tail learning via logit adjustment. arXiv preprint arXiv:2007.07314, 2020. 13 PRIME AI paper Appendix The appendix is organized as follows: • Appendix A includes additional detailed algorithms in the Automatic-Dataset-Construction pipeline. • Ap...

work page arXiv 2007

[77] [77]

For backward and forward correction, we train the model using cross-entropy (CE) loss for the first 10 epochs

The hyper-parameters for each baseline method are as follows. For backward and forward correction, we train the model using cross-entropy (CE) loss for the first 10 epochs. We estimate the transition matrix every epoch from the 10th to the 20th epoch. For the positive and negative label smoothing, the smoothed labels are used at the 10th epoch. The smooth...

work page 2097