{"work":{"id":"ae9e5671-23e8-4853-82a4-699b5b8dd639","openalex_id":null,"doi":null,"arxiv_id":"1512.03385","raw_key":null,"title":"Deep Residual Learning for Image Recognition","authors":null,"authors_text":"Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun","year":2015,"venue":"cs.CV","abstract":"Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers.\n  The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.","external_url":"https://arxiv.org/abs/1512.03385","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-07-03T16:58:43.340304+00:00","pith_arxiv_id":"1512.03385","created_at":"2026-05-08T18:23:55.757371+00:00","updated_at":"2026-07-03T16:58:43.340304+00:00","title_quality_ok":true,"display_title":"Deep Residual Learning for Image Recognition","render_title":"Deep Residual Learning for Image Recognition"},"hub":{"state":{"work_id":"ae9e5671-23e8-4853-82a4-699b5b8dd639","tier":"super_hub","tier_reason":"100+ Pith inbound or 10,000+ external citations","pith_inbound_count":194,"external_cited_by_count":null,"distinct_field_count":31,"first_pith_cited_at":"2016-04-21T04:15:27+00:00","last_pith_cited_at":"2026-07-01T17:45:51+00:00","author_build_status":"needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-07-03T17:34:49.653422+00:00","tier_text":"super_hub"},"tier":"super_hub","role_counts":[{"context_role":"background","n":18},{"context_role":"method","n":12},{"context_role":"baseline","n":2}],"polarity_counts":[{"context_polarity":"background","n":16},{"context_polarity":"use_method","n":12},{"context_polarity":"baseline","n":2},{"context_polarity":"unclear","n":2}],"runs":{"ask_index":{"job_type":"ask_index","status":"succeeded","result":{"title":"Deep Residual Learning for Image Recognition","claims":[{"claim_text":"Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG ne","claim_type":"abstract","evidence_strength":"source_metadata"},{"claim_text":"To validate whether the quality advantage of GS scenes translates to stronger navigation agents, we train five agent groups under differ- ent scene-domain mixtures, with training budget fixed at5×107 steps:A: 100 mesh scenes,B: 100 GS scenes,C: 80M + 20G,D: 50M + 50G, andE: 20M + 80G. All agents share a unified DD-PPO [30] architecture with a ResNet [6] visual encoder and a GRU [4] policy head, receiving256×256RGB and depth observations, with only training scene composition varying. Each agent i","claim_type":"method","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"deep networks, achieving new state-of-the-art results on CIFAR, SVHN, COCO, and signiﬁcant improvements on ImageNet. Our code and models are available at https: //github.com/szagoruyko/wide-residual-networks. 1 Introduction Convolutional neural networks have seen a gradual increase of the number of layers in the last few years, starting from AlexNet [16], VGG [26], Inception [30] to Residual [11] net- works, corresponding to improvements in many image recognition tasks. The superiority of deep n","claim_type":"baseline","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"The release of additional training data enabled participants to optionally incorporate these samples to develop more advanced methods leveraging an unseen writer set. 2.3 Baseline To provide participants with a starting point, we supplied a minimal, extensible deep learning baseline for both tasks, available on GitHub2 and Kaggle.3 We utilized a ResNet-18 [10] model with weights pretrained on ImageNet [16] weights. To suit the specific requirements of the Circleidcompetition, the output layer wa","claim_type":"baseline","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"Requires two label columns, e.g.X_eventandX_{unit}. Models CNN Generic convolutional backbone for volumetric data. Number of layers and feature maps can be defined in the config. ResNet Residual network for efficient gradient flow [14]. Sizes: ResNet-10, 18, 34, 50, 101, 152, and 200. DenseNet Dense connectivity for compact yet expressive feature extraction [15]. Sizes: 121, 169, 201, and 264. EfficientNetV2 Compact, high-performance CNN architecture using progressive scaling and fused convoluti","claim_type":"method","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"Avanzi, M. Lindholm, M. Maggi, M. Mayer, J. Schelldor- fer, and S. Scognamiglio, 2026. Available at SSRN:https://ssrn.com/abstract=5162304 or http://dx.doi.org/10.2139/ssrn.5162304. S. Xue and G. Wu. A study of the dependence between soil moisture and precipitation in different ecoregions of the northern hemisphere. Hydrology and Earth System Sciences , 29 (20):5575-5591, 2025. Z. Xueping, S. M. Samuri, Q. Lin, and M. H. M. Adnan. Integrating generative adversarial networks and stacked autoencod","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"characteristics based on their design, which is discussed in Section 3.1. 4 of 21 Table 1.The tested models with their main characteristics, where * refers to features specially designed for the model. Model Year Depth Main Design Characteristics Reference AlexNet 2012 8 Spatial [29] VGG-16 2014 16 Spatial and depth [30] GoogLeNet 2014 22 Depth and width [31] ResNet-18 2015 18 Skip connection [32] SqueezeNet 2016 18 Channels [33] ResNext 2016 101 Skip connection [34] DenseNet 2017 201 Skip conne","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks Deep Residual Learning for Image Recognition because it crossed a citation-hub threshold. Current citing contexts most often use it as background evidence (17 contexts).","role_counts":[{"n":17,"context_role":"background"},{"n":11,"context_role":"method"},{"n":2,"context_role":"baseline"}]},"error":null,"updated_at":"2026-05-20T12:51:58.537486+00:00"},"author_expand":{"job_type":"author_expand","status":"succeeded","result":{"authors_linked":[{"id":"02ad2b6c-8a3d-4309-ba96-5181bc91718e","orcid":null,"display_name":"Kaiming He"},{"id":"90e0e192-6197-4ef4-b9c2-386dbfd79fad","orcid":null,"display_name":"Xiangyu Zhang"},{"id":"c49ca12d-98c9-4fc2-a95f-a30b92a41773","orcid":null,"display_name":"Shaoqing Ren"},{"id":"fa0012a3-358f-4383-8457-2763cccd76e7","orcid":null,"display_name":"Jian Sun"}]},"error":null,"updated_at":"2026-05-20T12:51:58.815183+00:00"},"context_extract":{"job_type":"context_extract","status":"succeeded","result":{"enqueued_papers":25},"error":null,"updated_at":"2026-05-14T10:59:36.667775+00:00"},"graph_features":{"job_type":"graph_features","status":"succeeded","result":{"co_cited":[{"title":"An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale","work_id":"e96730e3-129b-4db6-b981-15ab7932e297","shared_citers":12},{"title":"Adam: A Method for Stochastic Optimization","work_id":"1910796d-9b52-4683-bf5c-de9632c1028b","shared_citers":10},{"title":"Decoupled Weight Decay Regularization","work_id":"07ef7360-d385-4033-83f7-8384a6325204","shared_citers":8},{"title":"Attention Is All You Need","work_id":"baafb5a2-5272-43bc-932f-09fa9ffe5316","shared_citers":7},{"title":"Very Deep Convolutional Networks for Large-Scale Image Recognition","work_id":"1c4b4409-c14b-488b-a086-c57a5aab8a29","shared_citers":7},{"title":"DINOv2: Learning Robust Visual Features without Supervision","work_id":"26b304e5-b54a-4f26-be7e-83299eca52e4","shared_citers":5},{"title":"Identity mappings in deep residual networks","work_id":"98f21757-763d-4985-a2f7-88fe55813938","shared_citers":5},{"title":"Language Models are Few-Shot Learners","work_id":"214732c0-2edd-44a0-af9e-28184a2b8279","shared_citers":5},{"title":"U-Net: Convolutional Networks for Biomedical Image Segmentation","work_id":"5c6b13d6-e704-4bf4-9df7-3a3a4d3b6950","shared_citers":5},{"title":"Wide Residual Networks","work_id":"1b918c80-6bca-4d06-8019-569626fb1cf2","shared_citers":5},{"title":"Auto-Encoding Variational Bayes","work_id":"97d95295-30e1-42b4-bbf6-85f0fa4edb44","shared_citers":4},{"title":"Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift","work_id":"05484516-8937-4cdf-9176-7f8329ef0221","shared_citers":4},{"title":"DINOv3","work_id":"c8b07deb-8fe7-4e18-9620-f3569d3529ce","shared_citers":4},{"title":"Generating Long Sequences with Sparse Transformers","work_id":"c5b81688-45ee-4a9a-b095-e6290f45cb6c","shared_citers":4},{"title":"Girshick, and Jian Sun","work_id":"0d7ba565-3e2a-4dff-aa24-13faf1f6e69e","shared_citers":4},{"title":"Layer Normalization","work_id":"20a2d720-0046-4c7c-bcd6-327ec8143f69","shared_citers":4},{"title":"Pixel recurrent neural networks","work_id":"69d61439-cf3d-4fbe-a48c-fbfddf5ca9ed","shared_citers":4},{"title":"preprint arXiv:1905.11946 , year=","work_id":"c2e77a93-b2ac-42ab-8fb4-2427017f7e7c","shared_citers":4},{"title":"a ckstr \\","work_id":"f1a25110-82fe-413c-866c-927d93520b88","shared_citers":3},{"title":"arXiv:1608.06993 [cs]","work_id":"c5239b52-732f-4827-b3d3-018425f71005","shared_citers":3},{"title":"Deep Learning using Rectified Linear Units (ReLU)","work_id":"1348fc83-94e6-4a01-b6c2-0568c6def951","shared_citers":3},{"title":"Deep networks with stochastic depth","work_id":"e74b4d3a-80aa-4530-90af-215206de46eb","shared_citers":3},{"title":"Gaussian Error Linear Units (GELUs)","work_id":"0466fd22-03a1-4a61-af0a-a900e77bb023","shared_citers":3},{"title":"Girshick, Kaiming He, Bharath Hariharan, and Serge J","work_id":"38f3557c-52ff-472c-a63f-819d0b810a12","shared_citers":3}],"time_series":[{"n":6,"year":2016},{"n":2,"year":2017},{"n":1,"year":2020},{"n":2,"year":2021},{"n":1,"year":2022},{"n":1,"year":2023},{"n":1,"year":2024},{"n":49,"year":2026}],"dependency_candidates":[]},"error":null,"updated_at":"2026-05-14T10:59:28.524834+00:00"},"identity_refresh":{"job_type":"identity_refresh","status":"succeeded","result":{"items":[{"title":"Qwen3 Technical Report","outcome":"unchanged","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","resolver":"local_arxiv","confidence":0.98,"old_work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e"}],"counts":{"fixed":0,"merged":0,"unchanged":1,"quarantined":0,"needs_external_resolution":0},"errors":[],"attempted":1},"error":null,"updated_at":"2026-05-14T10:59:28.489768+00:00"},"role_polarity":{"job_type":"role_polarity","status":"succeeded","result":{"title":"Deep Residual Learning for Image Recognition","claims":[{"claim_text":"Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG ne","claim_type":"abstract","evidence_strength":"source_metadata"},{"claim_text":"To validate whether the quality advantage of GS scenes translates to stronger navigation agents, we train five agent groups under differ- ent scene-domain mixtures, with training budget fixed at5×107 steps:A: 100 mesh scenes,B: 100 GS scenes,C: 80M + 20G,D: 50M + 50G, andE: 20M + 80G. All agents share a unified DD-PPO [30] architecture with a ResNet [6] visual encoder and a GRU [4] policy head, receiving256×256RGB and depth observations, with only training scene composition varying. Each agent i","claim_type":"method","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"deep networks, achieving new state-of-the-art results on CIFAR, SVHN, COCO, and signiﬁcant improvements on ImageNet. Our code and models are available at https: //github.com/szagoruyko/wide-residual-networks. 1 Introduction Convolutional neural networks have seen a gradual increase of the number of layers in the last few years, starting from AlexNet [16], VGG [26], Inception [30] to Residual [11] net- works, corresponding to improvements in many image recognition tasks. The superiority of deep n","claim_type":"baseline","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"The release of additional training data enabled participants to optionally incorporate these samples to develop more advanced methods leveraging an unseen writer set. 2.3 Baseline To provide participants with a starting point, we supplied a minimal, extensible deep learning baseline for both tasks, available on GitHub2 and Kaggle.3 We utilized a ResNet-18 [10] model with weights pretrained on ImageNet [16] weights. To suit the specific requirements of the Circleidcompetition, the output layer wa","claim_type":"baseline","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"Requires two label columns, e.g.X_eventandX_{unit}. Models CNN Generic convolutional backbone for volumetric data. Number of layers and feature maps can be defined in the config. ResNet Residual network for efficient gradient flow [14]. Sizes: ResNet-10, 18, 34, 50, 101, 152, and 200. DenseNet Dense connectivity for compact yet expressive feature extraction [15]. Sizes: 121, 169, 201, and 264. EfficientNetV2 Compact, high-performance CNN architecture using progressive scaling and fused convoluti","claim_type":"method","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"Avanzi, M. Lindholm, M. Maggi, M. Mayer, J. Schelldor- fer, and S. Scognamiglio, 2026. Available at SSRN:https://ssrn.com/abstract=5162304 or http://dx.doi.org/10.2139/ssrn.5162304. S. Xue and G. Wu. A study of the dependence between soil moisture and precipitation in different ecoregions of the northern hemisphere. Hydrology and Earth System Sciences , 29 (20):5575-5591, 2025. Z. Xueping, S. M. Samuri, Q. Lin, and M. H. M. Adnan. Integrating generative adversarial networks and stacked autoencod","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"characteristics based on their design, which is discussed in Section 3.1. 4 of 21 Table 1.The tested models with their main characteristics, where * refers to features specially designed for the model. Model Year Depth Main Design Characteristics Reference AlexNet 2012 8 Spatial [29] VGG-16 2014 16 Spatial and depth [30] GoogLeNet 2014 22 Depth and width [31] ResNet-18 2015 18 Skip connection [32] SqueezeNet 2016 18 Channels [33] ResNext 2016 101 Skip connection [34] DenseNet 2017 201 Skip conne","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks Deep Residual Learning for Image Recognition because it crossed a citation-hub threshold. Current citing contexts most often use it as background evidence (17 contexts).","role_counts":[{"n":17,"context_role":"background"},{"n":11,"context_role":"method"},{"n":2,"context_role":"baseline"}]},"error":null,"updated_at":"2026-05-20T12:51:58.817504+00:00"},"summary_claims":{"job_type":"summary_claims","status":"succeeded","result":{"title":"Deep Residual Learning for Image Recognition","claims":[{"claim_text":"Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG ne","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks Deep Residual Learning for Image Recognition because it crossed a citation-hub threshold.","role_counts":[]},"error":null,"updated_at":"2026-05-14T10:59:34.726884+00:00"}},"summary":{"title":"Deep Residual Learning for Image Recognition","claims":[{"claim_text":"Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG ne","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks Deep Residual Learning for Image Recognition because it crossed a citation-hub threshold.","role_counts":[]},"graph":{"co_cited":[{"title":"An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale","work_id":"e96730e3-129b-4db6-b981-15ab7932e297","shared_citers":12},{"title":"Adam: A Method for Stochastic Optimization","work_id":"1910796d-9b52-4683-bf5c-de9632c1028b","shared_citers":10},{"title":"Decoupled Weight Decay Regularization","work_id":"07ef7360-d385-4033-83f7-8384a6325204","shared_citers":8},{"title":"Attention Is All You Need","work_id":"baafb5a2-5272-43bc-932f-09fa9ffe5316","shared_citers":7},{"title":"Very Deep Convolutional Networks for Large-Scale Image Recognition","work_id":"1c4b4409-c14b-488b-a086-c57a5aab8a29","shared_citers":7},{"title":"DINOv2: Learning Robust Visual Features without Supervision","work_id":"26b304e5-b54a-4f26-be7e-83299eca52e4","shared_citers":5},{"title":"Identity mappings in deep residual networks","work_id":"98f21757-763d-4985-a2f7-88fe55813938","shared_citers":5},{"title":"Language Models are Few-Shot Learners","work_id":"214732c0-2edd-44a0-af9e-28184a2b8279","shared_citers":5},{"title":"U-Net: Convolutional Networks for Biomedical Image Segmentation","work_id":"5c6b13d6-e704-4bf4-9df7-3a3a4d3b6950","shared_citers":5},{"title":"Wide Residual Networks","work_id":"1b918c80-6bca-4d06-8019-569626fb1cf2","shared_citers":5},{"title":"Auto-Encoding Variational Bayes","work_id":"97d95295-30e1-42b4-bbf6-85f0fa4edb44","shared_citers":4},{"title":"Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift","work_id":"05484516-8937-4cdf-9176-7f8329ef0221","shared_citers":4},{"title":"DINOv3","work_id":"c8b07deb-8fe7-4e18-9620-f3569d3529ce","shared_citers":4},{"title":"Generating Long Sequences with Sparse Transformers","work_id":"c5b81688-45ee-4a9a-b095-e6290f45cb6c","shared_citers":4},{"title":"Girshick, and Jian Sun","work_id":"0d7ba565-3e2a-4dff-aa24-13faf1f6e69e","shared_citers":4},{"title":"Layer Normalization","work_id":"20a2d720-0046-4c7c-bcd6-327ec8143f69","shared_citers":4},{"title":"Pixel recurrent neural networks","work_id":"69d61439-cf3d-4fbe-a48c-fbfddf5ca9ed","shared_citers":4},{"title":"preprint arXiv:1905.11946 , year=","work_id":"c2e77a93-b2ac-42ab-8fb4-2427017f7e7c","shared_citers":4},{"title":"a ckstr \\","work_id":"f1a25110-82fe-413c-866c-927d93520b88","shared_citers":3},{"title":"arXiv:1608.06993 [cs]","work_id":"c5239b52-732f-4827-b3d3-018425f71005","shared_citers":3},{"title":"Deep Learning using Rectified Linear Units (ReLU)","work_id":"1348fc83-94e6-4a01-b6c2-0568c6def951","shared_citers":3},{"title":"Deep networks with stochastic depth","work_id":"e74b4d3a-80aa-4530-90af-215206de46eb","shared_citers":3},{"title":"Gaussian Error Linear Units (GELUs)","work_id":"0466fd22-03a1-4a61-af0a-a900e77bb023","shared_citers":3},{"title":"Girshick, Kaiming He, Bharath Hariharan, and Serge J","work_id":"38f3557c-52ff-472c-a63f-819d0b810a12","shared_citers":3}],"time_series":[{"n":6,"year":2016},{"n":2,"year":2017},{"n":1,"year":2020},{"n":2,"year":2021},{"n":1,"year":2022},{"n":1,"year":2023},{"n":1,"year":2024},{"n":49,"year":2026}],"dependency_candidates":[]},"authors":[{"id":"fa0012a3-358f-4383-8457-2763cccd76e7","orcid":null,"display_name":"Jian Sun","source":"manual","import_confidence":0.72},{"id":"02ad2b6c-8a3d-4309-ba96-5181bc91718e","orcid":null,"display_name":"Kaiming He","source":"manual","import_confidence":0.72},{"id":"c49ca12d-98c9-4fc2-a95f-a30b92a41773","orcid":null,"display_name":"Shaoqing Ren","source":"manual","import_confidence":0.72},{"id":"90e0e192-6197-4ef4-b9c2-386dbfd79fad","orcid":null,"display_name":"Xiangyu Zhang","source":"manual","import_confidence":0.72}]}}