hub Mixed citations

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

Mingxing Tan, Quoc V. Le · 2019 · cs.LG · arXiv 1905.11946

Mixed citation behavior. Most common role is background (67%).

30 Pith papers citing it

Background 67% of classified citations

open full Pith review browse 30 citing papers arXiv PDF

abstract

Convolutional Neural Networks (ConvNets) are commonly developed at a fixed resource budget, and then scaled up for better accuracy if more resources are available. In this paper, we systematically study model scaling and identify that carefully balancing network depth, width, and resolution can lead to better performance. Based on this observation, we propose a new scaling method that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient. We demonstrate the effectiveness of this method on scaling up MobileNets and ResNet. To go even further, we use neural architecture search to design a new baseline network and scale it up to obtain a family of models, called EfficientNets, which achieve much better accuracy and efficiency than previous ConvNets. In particular, our EfficientNet-B7 achieves state-of-the-art 84.3% top-1 accuracy on ImageNet, while being 8.4x smaller and 6.1x faster on inference than the best existing ConvNet. Our EfficientNets also transfer well and achieve state-of-the-art accuracy on CIFAR-100 (91.7%), Flowers (98.8%), and 3 other transfer learning datasets, with an order of magnitude fewer parameters. Source code is at https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4 dataset 1 method 1

citation-polarity summary

background 4 use dataset 1 use method 1

representative citing papers

Scaling Laws for Neural Language Models

cs.LG · 2020-01-23 · unverdicted · novelty 8.0

Empirical power-law scaling governs language model loss versus model size, data size, and compute, enabling optimal allocation of training compute.

Patch Hierarchical Attention Transformer for Efficient Particle Jet Tagging

hep-ex · 2026-05-20 · unverdicted · novelty 7.0

PHAT-JeT combines geometric message-passing with hierarchical patch attention to reach state-of-the-art accuracy and background rejection among resource-constrained jet tagging models on four benchmarks.

Characterizing Learning in Deep Neural Networks using Tractable Algorithmic Complexity Analysis

cs.LG · 2026-05-15 · unverdicted · novelty 7.0

QuBD extends algorithmic complexity estimation to quantized DNN weights, revealing that complexity decreases during learning, increases with overfitting, follows grokking patterns, and correlates with generalization.

Quantifying Explanation Consistency: The C-Score Metric for CAM-Based Explainability in Medical Image Classification

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

The C-Score quantifies intra-class explanation consistency for CAM methods via confidence-weighted pairwise soft IoU and detects AUC-consistency dissociation as an early warning for model instability on chest X-ray classification.

SMCNet: Supervised Surface Material Classification Using mmWave Radar IQ Signals and Complex-valued CNNs

eess.SP · 2026-04-08 · unverdicted · novelty 7.0

SMCNet applies a complex-valued CNN to mmWave radar IQ data for high-accuracy surface material classification across multiple and unseen sensing distances.

Event-based Civil Infrastructure Visual Defect Detection: ev-CIVIL Dataset and Benchmark

cs.CV · 2025-04-08 · unverdicted · novelty 7.0

Presents the ev-CIVIL dataset and benchmark showing that event-based cameras can support real-time detection of cracks and spalling in civil infrastructure under challenging lighting.

The DeepFake Detection Challenge (DFDC) Dataset

cs.CV · 2020-06-12 · accept · novelty 7.0

The DFDC dataset is the largest public collection of face-swapped videos and supports detectors that generalize to in-the-wild deepfakes.

LAA-X: Unified Localized Artifact Attention for Quality-Agnostic and Generalizable Face Forgery Detection

cs.CV · 2026-04-05 · unverdicted · novelty 6.0

LAA-X uses multi-task learning with explicit localized artifact attention and blending synthesis to build a deepfake detector that generalizes to high-quality and unseen manipulations after training only on real and pseudo-fake samples.

Navigating the Challenges of AI-Generated Image Detection in the Wild: What Truly Matters?

cs.CV · 2025-07-14 · conditional · novelty 6.0

The ITW-SM dataset and targeted optimization of detector design choices yield a 26.87% average AUC improvement for state-of-the-art AI-generated image detectors under real-world social media conditions.

Vision Transformers Need Registers

cs.CV · 2023-09-28 · unverdicted · novelty 6.0

Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.

Language Models (Mostly) Know What They Know

cs.CL · 2022-07-11 · unverdicted · novelty 6.0

Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.

A General Language Assistant as a Laboratory for Alignment

cs.CL · 2021-12-01 · conditional · novelty 6.0

Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.

Scaling Laws for Transfer

cs.LG · 2021-02-02 · unverdicted · novelty 6.0

Effective data transferred from pre-training to fine-tuning is described by a power law in model parameter count and fine-tuning dataset size, acting like a multiplier on the fine-tuning data.

Sharpness-Aware Minimization for Efficiently Improving Generalization

cs.LG · 2020-10-03 · conditional · novelty 6.0

SAM solves a min-max problem to locate flat low-loss regions, improving generalization on CIFAR, ImageNet and label-noise tasks.

When Does Sparse MoE Help in Vision? The Role of Backbone Compute Leverage in Sparse Routing

cs.CV · 2026-05-15 · unverdicted · novelty 5.0

Sparse MoE vision models show positive accuracy gaps only when routing a substantial compute fraction ρ and using k≥2 experts at large scale; batch-axis dispatch is identified as a key failure mode.

Exploring Clustering Capability of Inpainting Model Embeddings for Pattern-based Individual Identification

cs.CV · 2026-05-06 · unverdicted · novelty 5.0

Inpainting auxiliary task improves clustering of embeddings for individual zebrafish identification based on skin patterns.

DBLP: Phase-Aware Bounded-Loss Transport for Burst-Resilient Distributed ML Training

cs.LG · 2026-05-03 · unverdicted · novelty 5.0

DBLP is a training-phase-aware bounded-loss transport protocol that reduces end-to-end distributed ML training time by 24.4% on average (up to 33.9%) and achieves up to 5.88x communication speedup during microbursts while maintaining comparable test accuracy.

Equinox: Decentralized Scheduling for Hardware-Aware Orbital Intelligence

cs.DC · 2026-04-21 · unverdicted · novelty 5.0

Equinox uses a barrier-function-derived marginal cost to enable value-based adaptive scheduling and neighbor offloading in energy-constrained satellite constellations, yielding 20-31% throughput gains and higher battery reserves in simulation.

Non-identifiability of Explanations from Model Behavior in Deep Networks of Image Authenticity Judgments

cs.CV · 2026-04-08 · unverdicted · novelty 5.0

Models predicting human authenticity judgments produce inconsistent attribution maps across architectures, showing that explanations are non-identifiable.

Generalizable Deepfake Detection Based on Forgery-aware Layer Masking and Multi-artifact Subspace Decomposition

cs.CV · 2026-01-03 · unverdicted · novelty 5.0

FMSD improves cross-dataset generalization in deepfake detection by using gradient-based layer masking to select forgery-sensitive weights and SVD to split them into preserved semantic and multiple learnable artifact subspaces with orthogonality constraints.

Where Do Tokens Go? Understanding Pruning Behaviors in STEP at High Resolutions

cs.CV · 2025-09-17 · unverdicted · novelty 5.0

STEP uses dynamic superpatch merging via dCTS and early token exits to cut token count by 2.5x and computational complexity by up to 4x on ViT-Large for high-res segmentation, with at most 2% accuracy drop and 40% tokens halted early.

A Value-added Physical Properties Catalog for Low-redshift Galaxies from DESI Legacy Imaging Surveys DR10

astro-ph.GA · 2026-05-19 · unverdicted · novelty 4.0

A multimodal neural network trained on MPA-JHU references produces SFR, stellar mass, and metallicity estimates for 547 million low-redshift galaxies in DESI LS DR10.

Multi-Dataset Cross-Domain Knowledge Distillation for Unified Medical Image Segmentation, Classification, and Detection

cs.CV · 2026-05-02 · unverdicted · novelty 4.0

A multi-dataset cross-domain knowledge distillation approach improves unified performance on medical image segmentation, classification, and detection by transferring domain-invariant features from a joint teacher model to task-specific students.

DYMAPIA: A Multi-Domain Framework for Detecting AI-based Video Manipulation

cs.CV · 2026-04-27 · unverdicted · novelty 4.0

DYMAPIA builds dynamic anomaly masks from Fourier spectra, texture, edges, and optical flow to guide a lightweight DistXCNet classifier, reporting over 99% accuracy and F1 on FF++, Celeb-DF, and VDFD.

citing papers explorer

Showing 30 of 30 citing papers.

Scaling Laws for Neural Language Models cs.LG · 2020-01-23 · unverdicted · none · ref 12 · internal anchor
Empirical power-law scaling governs language model loss versus model size, data size, and compute, enabling optimal allocation of training compute.
Patch Hierarchical Attention Transformer for Efficient Particle Jet Tagging hep-ex · 2026-05-20 · unverdicted · none · ref 14 · internal anchor
PHAT-JeT combines geometric message-passing with hierarchical patch attention to reach state-of-the-art accuracy and background rejection among resource-constrained jet tagging models on four benchmarks.
Characterizing Learning in Deep Neural Networks using Tractable Algorithmic Complexity Analysis cs.LG · 2026-05-15 · unverdicted · none · ref 51 · internal anchor
QuBD extends algorithmic complexity estimation to quantized DNN weights, revealing that complexity decreases during learning, increases with overfitting, follows grokking patterns, and correlates with generalization.
Quantifying Explanation Consistency: The C-Score Metric for CAM-Based Explainability in Medical Image Classification cs.CV · 2026-04-09 · unverdicted · none · ref 53 · internal anchor
The C-Score quantifies intra-class explanation consistency for CAM methods via confidence-weighted pairwise soft IoU and detects AUC-consistency dissociation as an early warning for model instability on chest X-ray classification.
SMCNet: Supervised Surface Material Classification Using mmWave Radar IQ Signals and Complex-valued CNNs eess.SP · 2026-04-08 · unverdicted · none · ref 17 · internal anchor
SMCNet applies a complex-valued CNN to mmWave radar IQ data for high-accuracy surface material classification across multiple and unseen sensing distances.
Event-based Civil Infrastructure Visual Defect Detection: ev-CIVIL Dataset and Benchmark cs.CV · 2025-04-08 · unverdicted · none · ref 54 · internal anchor
Presents the ev-CIVIL dataset and benchmark showing that event-based cameras can support real-time detection of cracks and spalling in civil infrastructure under challenging lighting.
The DeepFake Detection Challenge (DFDC) Dataset cs.CV · 2020-06-12 · accept · none · ref 28 · internal anchor
The DFDC dataset is the largest public collection of face-swapped videos and supports detectors that generalize to in-the-wild deepfakes.
LAA-X: Unified Localized Artifact Attention for Quality-Agnostic and Generalizable Face Forgery Detection cs.CV · 2026-04-05 · unverdicted · none · ref 27 · internal anchor
LAA-X uses multi-task learning with explicit localized artifact attention and blending synthesis to build a deepfake detector that generalizes to high-quality and unseen manipulations after training only on real and pseudo-fake samples.
Navigating the Challenges of AI-Generated Image Detection in the Wild: What Truly Matters? cs.CV · 2025-07-14 · conditional · none · ref 42 · internal anchor
The ITW-SM dataset and targeted optimization of detector design choices yield a 26.87% average AUC improvement for state-of-the-art AI-generated image detectors under real-world social media conditions.
Vision Transformers Need Registers cs.CV · 2023-09-28 · unverdicted · none · ref 90 · internal anchor
Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.
Language Models (Mostly) Know What They Know cs.CL · 2022-07-11 · unverdicted · none · ref 168 · internal anchor
Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.
A General Language Assistant as a Laboratory for Alignment cs.CL · 2021-12-01 · conditional · none · ref 110 · internal anchor
Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.
Scaling Laws for Transfer cs.LG · 2021-02-02 · unverdicted · none · ref 80 · internal anchor
Effective data transferred from pre-training to fine-tuning is described by a power law in model parameter count and fine-tuning dataset size, acting like a multiplier on the fine-tuning data.
Sharpness-Aware Minimization for Efficiently Improving Generalization cs.LG · 2020-10-03 · conditional · none · ref 42 · internal anchor
SAM solves a min-max problem to locate flat low-loss regions, improving generalization on CIFAR, ImageNet and label-noise tasks.
When Does Sparse MoE Help in Vision? The Role of Backbone Compute Leverage in Sparse Routing cs.CV · 2026-05-15 · unverdicted · none · ref 14 · internal anchor
Sparse MoE vision models show positive accuracy gaps only when routing a substantial compute fraction ρ and using k≥2 experts at large scale; batch-axis dispatch is identified as a key failure mode.
Exploring Clustering Capability of Inpainting Model Embeddings for Pattern-based Individual Identification cs.CV · 2026-05-06 · unverdicted · none · ref 78 · internal anchor
Inpainting auxiliary task improves clustering of embeddings for individual zebrafish identification based on skin patterns.
DBLP: Phase-Aware Bounded-Loss Transport for Burst-Resilient Distributed ML Training cs.LG · 2026-05-03 · unverdicted · none · ref 36 · internal anchor
DBLP is a training-phase-aware bounded-loss transport protocol that reduces end-to-end distributed ML training time by 24.4% on average (up to 33.9%) and achieves up to 5.88x communication speedup during microbursts while maintaining comparable test accuracy.
Equinox: Decentralized Scheduling for Hardware-Aware Orbital Intelligence cs.DC · 2026-04-21 · unverdicted · none · ref 30 · internal anchor
Equinox uses a barrier-function-derived marginal cost to enable value-based adaptive scheduling and neighbor offloading in energy-constrained satellite constellations, yielding 20-31% throughput gains and higher battery reserves in simulation.
Non-identifiability of Explanations from Model Behavior in Deep Networks of Image Authenticity Judgments cs.CV · 2026-04-08 · unverdicted · none · ref 8 · internal anchor
Models predicting human authenticity judgments produce inconsistent attribution maps across architectures, showing that explanations are non-identifiable.
Generalizable Deepfake Detection Based on Forgery-aware Layer Masking and Multi-artifact Subspace Decomposition cs.CV · 2026-01-03 · unverdicted · none · ref 12 · internal anchor
FMSD improves cross-dataset generalization in deepfake detection by using gradient-based layer masking to select forgery-sensitive weights and SVD to split them into preserved semantic and multiple learnable artifact subspaces with orthogonality constraints.
Where Do Tokens Go? Understanding Pruning Behaviors in STEP at High Resolutions cs.CV · 2025-09-17 · unverdicted · none · ref 61 · internal anchor
STEP uses dynamic superpatch merging via dCTS and early token exits to cut token count by 2.5x and computational complexity by up to 4x on ViT-Large for high-res segmentation, with at most 2% accuracy drop and 40% tokens halted early.
A Value-added Physical Properties Catalog for Low-redshift Galaxies from DESI Legacy Imaging Surveys DR10 astro-ph.GA · 2026-05-19 · unverdicted · none · ref 110 · internal anchor
A multimodal neural network trained on MPA-JHU references produces SFR, stellar mass, and metallicity estimates for 547 million low-redshift galaxies in DESI LS DR10.
Multi-Dataset Cross-Domain Knowledge Distillation for Unified Medical Image Segmentation, Classification, and Detection cs.CV · 2026-05-02 · unverdicted · none · ref 79 · internal anchor
A multi-dataset cross-domain knowledge distillation approach improves unified performance on medical image segmentation, classification, and detection by transferring domain-invariant features from a joint teacher model to task-specific students.
DYMAPIA: A Multi-Domain Framework for Detecting AI-based Video Manipulation cs.CV · 2026-04-27 · unverdicted · none · ref 35 · internal anchor
DYMAPIA builds dynamic anomaly masks from Fourier spectra, texture, edges, and optical flow to guide a lightweight DistXCNet classifier, reporting over 99% accuracy and F1 on FF++, Celeb-DF, and VDFD.
A Resource Efficient Fusion Network for Object Detection in Bird's-Eye View using Camera and Raw Radar Data cs.CV · 2024-11-20 · unverdicted · none · ref 57 · internal anchor
Describes a camera-radar fusion network that uses raw RD spectra and BEV-polar camera features for BEV object detection, evaluated for accuracy and compute on the RADIal dataset.
Image Classification via Random Dilated Convolution with Multi-Branch Feature Extraction and Context Excitation cs.CV · 2026-04-28 · unverdicted · none · ref 41 · internal anchor
RDCNet reports state-of-the-art accuracy on CIFAR-10, CIFAR-100, SVHN, Imagenette, and Imagewoof by combining random dilated convolutions with multi-branch and attention modules.
Real-Time Cellist Postural Evaluation With On-Device Computer Vision cs.HC · 2026-04-19 · unverdicted · none · ref 16 · internal anchor
Cello Evaluator is a real-time postural feedback system for cellists running on current Android phones via on-device computer vision, validated as user-friendly by experts.
Towards Accurate and Efficient Waste Image Classification: A Hybrid Deep Learning and Machine Learning Approach cs.CV · 2025-10-22 · unverdicted · none · ref 36 · internal anchor
A hybrid deep learning plus classical ML pipeline for waste image classification reaches up to 100% accuracy on TrashNet and a corrected household dataset while cutting feature dimensionality by over 95%.
Robust Deepfake Detection, NTIRE 2026 Challenge: Report cs.CV · 2026-04-27 · unverdicted · none · ref 71 · internal anchor
The NTIRE 2026 challenge finds that large foundation models combined with ensembles and degradation-aware training produce the most robust deepfake detectors.
Introduction to Camera Pose Estimation with Deep Learning cs.CV · 2019-07-08 · unverdicted · none · ref 2 · internal anchor
A survey of deep learning approaches for regressing absolute camera pose from single RGB images, covering key methods, trends, cross-comparisons, and reproducibility notes.

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer