pith. machine review for the scientific record. sign in

arxiv: 1902.10811 · v2 · submitted 2019-02-13 · 💻 cs.CV · cs.LG· stat.ML

Recognition: unknown

Do ImageNet Classifiers Generalize to ImageNet?

Benjamin Recht , Rebecca Roelofs , Ludwig Schmidt , Vaishaal Shankar

Authors on Pith no claims yet
classification 💻 cs.CV cs.LGstat.ML
keywords testsetsimagenetaccuracygeneralizemodelsoriginalcifar-10
0
0 comments X
read the original abstract

We build new test sets for the CIFAR-10 and ImageNet datasets. Both benchmarks have been the focus of intense research for almost a decade, raising the danger of overfitting to excessively re-used test sets. By closely following the original dataset creation processes, we test to what extent current classification models generalize to new data. We evaluate a broad range of models and find accuracy drops of 3% - 15% on CIFAR-10 and 11% - 14% on ImageNet. However, accuracy gains on the original test sets translate to larger gains on the new test sets. Our results suggest that the accuracy drops are not caused by adaptivity, but by the models' inability to generalize to slightly "harder" images than those found in the original test sets.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. LAION-5B: An open large-scale dataset for training next generation image-text models

    cs.CV 2022-10 accept novelty 7.0

    LAION-5B is an openly released dataset of 5.85 billion CLIP-filtered image-text pairs that enables replication of foundational vision-language models.

  2. Medical Model Synthesis Architectures: A Case Study

    cs.AI 2026-05 unverdicted novelty 5.0

    MedMSA framework retrieves knowledge via language models then builds formal probabilistic models to produce uncertainty-weighted differential diagnoses from symptoms.

  3. Debunking Grad-ECLIP: A Comprehensive Study on Its Incorrectness and Fundamental Principles for Model Interpretation

    cs.CV 2026-05 unverdicted novelty 4.0

    Grad-ECLIP is an equivalent but flawed variant of attention-based interpretation, with two principles proposed to ensure model explanations reflect the original model.