Are all training examples equally valuable?

Agata Lapedriza , Hamed Pirsiavash , Zoya Bylinskii , Antonio Torralba

Authors on Pith no claims yet

classification 💻 cs.CV cs.LGstat.ML

keywords trainingexamplesclassifierssomebetterdetectorsequallyothers

read the original abstract

When learning a new concept, not all training examples may prove equally useful for training: some may have higher or lower training value than others. The goal of this paper is to bring to the attention of the vision community the following considerations: (1) some examples are better than others for training detectors or classifiers, and (2) in the presence of better examples, some examples may negatively impact performance and removing them may be beneficial. In this paper, we propose an approach for measuring the training value of an example, and use it for ranking and greedily sorting examples. We test our methods on different vision tasks, models, datasets and classifiers. Our experiments show that the performance of current state-of-the-art detectors and classifiers can be improved when training on a subset, rather than the whole training set.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Scaling-Aware Data Selection for End-to-End Autonomous Driving Systems
cs.LG 2026-04 unverdicted novelty 6.0

MOSAIC is a scaling-aware data selection framework that outperforms baselines in training end-to-end autonomous driving planners, achieving comparable or better EPDMS scores with up to 80% less data.