pith. machine review for the scientific record. sign in

hub

Deep Residual Learning for Image Recognition

64 Pith papers cite this work. Polarity classification is still indexing.

64 Pith papers citing it
abstract

Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

hub tools

claims ledger

  • abstract Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG ne

co-cited works

clear filters

representative citing papers

WaveNet: A Generative Model for Raw Audio

cs.SD · 2016-09-12 · accept · novelty 9.0

WaveNet generates realistic raw audio using an autoregressive neural network with dilated convolutions, achieving state-of-the-art naturalness in speech synthesis for English and Mandarin.

Density estimation using Real NVP

cs.LG · 2016-05-27 · accept · novelty 8.0

Real NVP uses affine coupling layers to create invertible transformations that support exact density estimation, sampling, and latent inference without approximations.

Replica Theory of Spherical Boltzmann Machine Ensembles

cond-mat.dis-nn · 2026-04-20 · unverdicted · novelty 7.0

Replica calculations fully solve spherical Boltzmann machine ensembles and identify regimes where ensemble learning outperforms standard training, particularly for nearly finite-dimensional data.

Grokking of Diffusion Models: Case Study on Modular Addition

cs.LG · 2026-04-20 · unverdicted · novelty 7.0

Diffusion models show grokking on modular addition by composing periodic operand representations in simple data regimes or by separating arithmetic computation from visual denoising across timesteps in varied regimes.

MS MARCO: A Human Generated MAchine Reading COmprehension Dataset

cs.CL · 2016-11-28 · accept · novelty 7.0

MS MARCO is a new large-scale machine reading comprehension dataset built from real Bing search queries, human-generated answers, and web passages, supporting three tasks including answer synthesis and passage ranking.

Wide Residual Networks

cs.CV · 2016-05-23 · accept · novelty 7.0

Wide residual networks achieve higher accuracy and faster training than very deep thin residual networks by increasing width and decreasing depth, setting new state-of-the-art results on CIFAR, SVHN, and ImageNet.

Training Deep Nets with Sublinear Memory Cost

cs.LG · 2016-04-21 · accept · novelty 7.0

An algorithm trains n-layer networks with O(sqrt(n)) memory via selective recomputation of activations, at the cost of one extra forward pass.

It Just Takes Two: Scaling Amortized Inference to Large Sets

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

A mean-pool deep set trained on sets of size at most two produces an encoder that generalizes to arbitrary sizes, decoupling representation learning from posterior modeling and making training cost independent of deployment set size N.

citing papers explorer

Showing 10 of 10 citing papers after filters.

  • WaveNet: A Generative Model for Raw Audio cs.SD · 2016-09-12 · accept · none · ref 11 · internal anchor

    WaveNet generates realistic raw audio using an autoregressive neural network with dilated convolutions, achieving state-of-the-art naturalness in speech synthesis for English and Mandarin.

  • Density estimation using Real NVP cs.LG · 2016-05-27 · accept · none · ref 24 · internal anchor

    Real NVP uses affine coupling layers to create invertible transformations that support exact density estimation, sampling, and latent inference without approximations.

  • MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications cs.CV · 2017-04-17 · accept · none · ref 8 · internal anchor

    MobileNets introduce depthwise separable convolutions plus width and resolution multipliers to produce efficient CNNs that trade off latency and accuracy for mobile and embedded vision applications.

  • MS MARCO: A Human Generated MAchine Reading COmprehension Dataset cs.CL · 2016-11-28 · accept · none · ref 8 · internal anchor

    MS MARCO is a new large-scale machine reading comprehension dataset built from real Bing search queries, human-generated answers, and web passages, supporting three tasks including answer synthesis and passage ranking.

  • Wide Residual Networks cs.CV · 2016-05-23 · accept · none · ref 11 · internal anchor

    Wide residual networks achieve higher accuracy and faster training than very deep thin residual networks by increasing width and decreasing depth, setting new state-of-the-art results on CIFAR, SVHN, and ImageNet.

  • Training Deep Nets with Sublinear Memory Cost cs.LG · 2016-04-21 · accept · none · ref 10 · internal anchor

    An algorithm trains n-layer networks with O(sqrt(n)) memory via selective recomputation of activations, at the cost of one extra forward pass.

  • ICDAR 2026 Competition on Writer Identification and Pen Classification from Hand-Drawn Circles cs.CV · 2026-05-08 · accept · none · ref 10 · internal anchor

    A new dataset of hand-drawn circles from 66 writers and 8 pens yields competition results of 64.8% top-1 accuracy for open-set writer identification and 92.7% for pen classification.

  • VideoGPT: Video Generation using VQ-VAE and Transformers cs.CV · 2021-04-20 · accept · none · ref 15 · internal anchor

    VideoGPT generates competitive natural videos by learning discrete latents with VQ-VAE and modeling them autoregressively with a transformer.

  • SGDR: Stochastic Gradient Descent with Warm Restarts cs.LG · 2016-08-13 · accept · none · ref 7 · internal anchor

    SGDR uses periodic warm restarts of the learning rate in SGD to reach new state-of-the-art error rates of 3.14% on CIFAR-10 and 16.21% on CIFAR-100.

  • Virtual KITTI 2 cs.CV · 2020-01-29 · accept · none · ref 22 · internal anchor

    Virtual KITTI 2 supplies synthetic clones of real KITTI driving sequences with added weather and camera variants and multi-modal ground-truth annotations for autonomous driving vision research.