pith. sign in

hub Mixed citations

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

Mixed citation behavior. Most common role is background (67%).

44 Pith papers citing it
Background 67% of classified citations
abstract

Convolutional Neural Networks (ConvNets) are commonly developed at a fixed resource budget, and then scaled up for better accuracy if more resources are available. In this paper, we systematically study model scaling and identify that carefully balancing network depth, width, and resolution can lead to better performance. Based on this observation, we propose a new scaling method that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient. We demonstrate the effectiveness of this method on scaling up MobileNets and ResNet. To go even further, we use neural architecture search to design a new baseline network and scale it up to obtain a family of models, called EfficientNets, which achieve much better accuracy and efficiency than previous ConvNets. In particular, our EfficientNet-B7 achieves state-of-the-art 84.3% top-1 accuracy on ImageNet, while being 8.4x smaller and 6.1x faster on inference than the best existing ConvNet. Our EfficientNets also transfer well and achieve state-of-the-art accuracy on CIFAR-100 (91.7%), Flowers (98.8%), and 3 other transfer learning datasets, with an order of magnitude fewer parameters. Source code is at https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet.

hub tools

citation-role summary

background 4 dataset 1 method 1

citation-polarity summary

clear filters

representative citing papers

Scaling Laws for Neural Language Models

cs.LG · 2020-01-23 · unverdicted · novelty 8.0

Empirical power-law scaling governs language model loss versus model size, data size, and compute, enabling optimal allocation of training compute.

The Regularizing Power of Language-Training Deepfake Detectors

cs.CV · 2026-05-29 · unverdicted · novelty 7.0

A dual-encoder deepfake detector pairs a frozen specialist with a LoRA-tuned MLLM, trained first via binary alignment then via RL to reward explain-then-classify behavior, yielding improved cross-dataset performance and interpretability.

Probabilistic Inversion with Flow Matching

cs.LG · 2026-06-30 · unverdicted · novelty 6.0

Adapts Flow Matching from generative AI to probabilistic inversion, evaluated on a simple 2D velocity model and the OpenFWI seismic dataset.

Vision Transformers Need Registers

cs.CV · 2023-09-28 · unverdicted · novelty 6.0

Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.

Language Models (Mostly) Know What They Know

cs.CL · 2022-07-11 · unverdicted · novelty 6.0

Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.

Scaling Laws for Transfer

cs.LG · 2021-02-02 · unverdicted · novelty 6.0

Effective data transferred from pre-training to fine-tuning is described by a power law in model parameter count and fine-tuning dataset size, acting like a multiplier on the fine-tuning data.

citing papers explorer

Showing 7 of 7 citing papers after filters.

  • Scaling Laws for Neural Language Models cs.LG · 2020-01-23 · unverdicted · none · ref 12 · internal anchor

    Empirical power-law scaling governs language model loss versus model size, data size, and compute, enabling optimal allocation of training compute.

  • Characterizing Learning in Deep Neural Networks using Tractable Algorithmic Complexity Analysis cs.LG · 2026-05-15 · unverdicted · none · ref 51 · internal anchor

    QuBD extends algorithmic complexity estimation to quantized DNN weights, revealing that complexity decreases during learning, increases with overfitting, follows grokking patterns, and correlates with generalization.

  • Probabilistic Inversion with Flow Matching cs.LG · 2026-06-30 · unverdicted · none · ref 3 · internal anchor

    Adapts Flow Matching from generative AI to probabilistic inversion, evaluated on a simple 2D velocity model and the OpenFWI seismic dataset.

  • Scaling Laws for Transfer cs.LG · 2021-02-02 · unverdicted · none · ref 80 · internal anchor

    Effective data transferred from pre-training to fine-tuning is described by a power law in model parameter count and fine-tuning dataset size, acting like a multiplier on the fine-tuning data.

  • Sharpness-Aware Minimization for Efficiently Improving Generalization cs.LG · 2020-10-03 · conditional · none · ref 42 · internal anchor

    SAM solves a min-max problem to locate flat low-loss regions, improving generalization on CIFAR, ImageNet and label-noise tasks.

  • A combination of noise and bilateral filters achieve supralinear and scalable adversarial robustness in CNNs cs.LG · 2026-06-01 · unverdicted · none · ref 49 · internal anchor

    A preprocessor of Gaussian noise plus bilateral filtering yields supralinear adversarial robustness in CNNs and, when paired with adversarial training, ranks near the top of RobustBench while using far less compute, parameters, epochs, and data than prior defenses.

  • DBLP: Phase-Aware Bounded-Loss Transport for Burst-Resilient Distributed ML Training cs.LG · 2026-05-03 · unverdicted · none · ref 36 · internal anchor

    DBLP is a training-phase-aware bounded-loss transport protocol that reduces end-to-end distributed ML training time by 24.4% on average (up to 33.9%) and achieves up to 5.88x communication speedup during microbursts while maintaining comparable test accuracy.