Rethinking the Inception Architecture for Computer Vision

Christian Szegedy, Jonathon Shlens, Sergey Ioffe, Vincent Vanhoucke, Zbigniew Wojna

classification 💻 cs.CV

keywords errorcomputationalgainsnetworksvalidationvisioncomputerconvolutional

read the original abstract

Convolutional networks are at the core of most state-of-the-art computer vision solutions for a wide variety of tasks. Since 2014 very deep convolutional networks started to become mainstream, yielding substantial gains in various benchmarks. Although increased model size and computational cost tend to translate to immediate quality gains for most tasks (as long as enough labeled data is provided for training), computational efficiency and low parameter count are still enabling factors for various use cases such as mobile vision and big-data scenarios. Here we explore ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization. We benchmark our methods on the ILSVRC 2012 classification challenge validation set demonstrate substantial gains over the state of the art: 21.2% top-1 and 5.6% top-5 error for single frame evaluation using a network with a computational cost of 5 billion multiply-adds per inference and with using less than 25 million parameters. With an ensemble of 4 models and multi-crop evaluation, we report 3.5% top-5 error on the validation set (3.6% error on the test set) and 17.3% top-1 error on the validation set.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 13 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Mind the Gap: Geometrically Accurate Generative Reconstruction from Disjoint Views
cs.CV 2026-05 unverdicted novelty 8.0

GLADOS reconstructs 3D geometry from disjoint views by generating intermediate perspectives, performing robust coarse alignment that tolerates generative inconsistencies, and iteratively expanding context for consistency.
Categorical Reparameterization with Gumbel-Softmax
stat.ML 2016-11 unverdicted novelty 8.0

Gumbel-Softmax provides a continuous relaxation of categorical sampling that anneals to discrete samples for gradient-based optimization.
Transition-Matrix Regularization for Next Dialogue Act Prediction in Counselling Conversations
cs.CL 2026-04 unverdicted novelty 7.0

KL regularization aligning model predictions with empirical transition patterns improves macro-F1 by 9-42% in next dialogue act prediction on German counselling data and transfers to other datasets.
Privatar: Scalable Privacy-preserving Multi-user VR via Secure Offloading
cs.CR 2026-04 unverdicted novelty 7.0

Privatar uses horizontal frequency partitioning and distribution-aware minimal perturbation to enable private offloading of VR avatar reconstruction, supporting 2.37x more users with modest overhead.
Physics-informed, Generative Adversarial Design of Funicular Shells
cs.CE 2026-04 unverdicted novelty 7.0

A modified DCGAN with an auxiliary discriminator using the membrane factor generates stable, previously unseen funicular shells optimized for pure compression in three dimensions.
Diffusion Models Beat GANs on Image Synthesis
cs.LG 2021-05 accept novelty 7.0

Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
cs.CV 2017-04 accept novelty 7.0

MobileNets introduce depthwise separable convolutions plus width and resolution multipliers to produce efficient CNNs that trade off latency and accuracy for mobile and embedded vision applications.
From Clever Hans to Scientific Discovery: Interpreting EEG Foundational Transformers with LRP
cs.AI 2026-05 unverdicted novelty 6.0

LRP on EEG transformers reveals Clever Hans artifacts in motor imagery tasks and a recurring central electrode cluster as a candidate sensorimotor signature of arousal.
CASCADE: Context-Aware Relaxation for Speculative Image Decoding
cs.CV 2026-05 unverdicted novelty 6.0

CASCADE formalizes semantic interchangeability and convergence in target model representations to enable context-aware acceptance relaxation in tree-based speculative decoding, delivering up to 3.6x speedup on text-to...
Exploring Clustering Capability of Inpainting Model Embeddings for Pattern-based Individual Identification
cs.CV 2026-05 unverdicted novelty 5.0

Inpainting auxiliary task improves clustering of embeddings for individual zebrafish identification based on skin patterns.
From Spherical to Gaussian: A Comparative Analysis of Point Cloud Cropping Strategies in Large-Scale 3D Environments
cs.CV 2026-05 unverdicted novelty 5.0

Gaussian and linear cropping strategies for large point clouds improve 3D neural network performance over spherical crops, especially in outdoor scenes, and achieve new state-of-the-art results.
Attention Is All You Need
cs.CL 2017-06 unverdicted novelty 5.0

Pith review generated a malformed one-line summary.
Diabetic Retinopathy Classification using Downscaling Algorithms and Deep Learning
cs.CV 2026-05 unverdicted novelty 3.0

A multi-channel Inception V3 model with custom downscaling preprocessing outperforms prior methods on accuracy, sensitivity, and specificity when trained on the combined Kaggle and IDRiD diabetic retinopathy datasets.