pith. machine review for the scientific record. sign in

arxiv: 1605.07146 · v4 · submitted 2016-05-23 · 💻 cs.CV · cs.LG· cs.NE

Recognition: 1 theorem link

Wide Residual Networks

Authors on Pith no claims yet

Pith reviewed 2026-05-13 01:20 UTC · model grok-4.3

classification 💻 cs.CV cs.LGcs.NE
keywords residual networkswide residual networksimage classificationneural network architectureCIFARdeep learningfeature reuse
0
0 comments X

The pith

Wide residual networks with reduced depth and increased width outperform much deeper thin residual networks in accuracy and training speed.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines the diminishing feature reuse problem in very deep residual networks, where adding layers yields smaller gains at high computational cost. It proposes wide residual networks that instead reduce the number of layers while expanding the width of each layer through more channels. Experiments demonstrate that even a basic 16-layer wide network surpasses the accuracy of prior thousand-layer deep residual networks while training more efficiently. This matters because it points to a practical way to build stronger image recognition models without the slowdowns of extreme depth on benchmarks like CIFAR, SVHN, and ImageNet.

Core claim

Residual networks improve more effectively when made wider rather than deeper; the resulting wide residual networks achieve new state-of-the-art accuracy on CIFAR, SVHN, and COCO while delivering significant gains on ImageNet, all with far fewer layers than the thin deep baselines they replace.

What carries the argument

The wide residual block, formed by decreasing overall network depth and increasing the number of feature channels per layer while retaining the residual shortcut connections.

If this is right

  • Training time and memory use drop because shallower networks avoid the slowdown from excessive layers.
  • Accuracy improves on CIFAR, SVHN, COCO, and ImageNet without needing thousand-layer depths.
  • Feature reuse becomes more effective, allowing simpler networks to reach higher performance.
  • The architecture change applies across multiple datasets without requiring entirely new block designs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Architectures in other domains might also gain more from width scaling than from depth scaling when feature reuse is the bottleneck.
  • Model design could shift toward finding optimal width-to-depth ratios instead of always maximizing depth.
  • Similar width-focused adjustments might improve efficiency in non-residual networks facing training slowdowns.

Load-bearing premise

The performance gains arise primarily from the width increase and depth reduction rather than from training schedule, data augmentation, or hyperparameter differences that might favor the new models.

What would settle it

Re-train the original thousand-layer thin ResNet using the exact same width, training schedule, and data augmentation as the 16-layer wide network and measure whether the accuracy gap disappears or reverses.

read the original abstract

Deep residual networks were shown to be able to scale up to thousands of layers and still have improving performance. However, each fraction of a percent of improved accuracy costs nearly doubling the number of layers, and so training very deep residual networks has a problem of diminishing feature reuse, which makes these networks very slow to train. To tackle these problems, in this paper we conduct a detailed experimental study on the architecture of ResNet blocks, based on which we propose a novel architecture where we decrease depth and increase width of residual networks. We call the resulting network structures wide residual networks (WRNs) and show that these are far superior over their commonly used thin and very deep counterparts. For example, we demonstrate that even a simple 16-layer-deep wide residual network outperforms in accuracy and efficiency all previous deep residual networks, including thousand-layer-deep networks, achieving new state-of-the-art results on CIFAR, SVHN, COCO, and significant improvements on ImageNet. Our code and models are available at https://github.com/szagoruyko/wide-residual-networks

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 4 minor

Summary. The manuscript introduces Wide Residual Networks (WRNs) by decreasing the depth of residual blocks while increasing their width via a width multiplier k. It reports that a simple 16-layer WRN outperforms all prior deep residual networks (including 1000-layer variants) in accuracy and training speed on CIFAR-10/100 and SVHN, with additional gains on ImageNet classification and COCO detection. The authors provide controlled ablations under fixed parameter budgets and release code and models.

Significance. The work is significant because it supplies reproducible empirical evidence that width can be more effective than extreme depth for residual networks, yielding faster convergence and better accuracy under matched training protocols. The public release of code, models, and the use of re-implemented baselines strengthen the reliability of the performance claims and their utility for the community.

minor comments (4)
  1. [§3.1] §3.1: The description of the basic block could include an explicit equation or diagram showing how the width multiplier k scales the number of filters in the 3×3 convolutions.
  2. [Table 1] Table 1: Adding a column for total parameters and training time per epoch would make the efficiency claims easier to verify at a glance.
  3. [§4.2] §4.2: The SVHN results mention a specific dropout placement; a brief note on whether the same schedule was used for all baseline re-implementations would improve clarity.
  4. [Figure 3] Figure 3: The learning curves are informative, but axis labels could specify the exact metric (e.g., top-1 error) and include a legend for the different k values.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive review, accurate summary of our contributions, and recommendation to accept the manuscript. We appreciate the recognition of the significance of our empirical results on width versus depth in residual networks, as well as the value placed on our code and model releases.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is an empirical architecture study. It conducts controlled ablations on ResNet blocks, proposes wider-shallower variants, and validates via accuracy/efficiency comparisons on fixed public benchmarks (CIFAR, SVHN, ImageNet, COCO). No equations, fitted parameters renamed as predictions, or self-referential derivations appear. Baselines are re-implemented under the authors' protocol rather than taken verbatim. Central claims rest on experimental outcomes independent of prior self-citations or definitional loops.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

Claims rest on empirical validation using standard deep learning training; no new theoretical entities or derivations are introduced beyond the architectural modification.

free parameters (2)
  • width multiplier k
    Controls the number of channels in residual blocks; values like 2, 4, 8, 10 are tested and selected for best accuracy-parameter trade-off.
  • dropout rate
    Added between convolutions in wide blocks for regularization; rates such as 0.3-0.4 are tuned per dataset.
axioms (2)
  • domain assumption Residual skip connections mitigate vanishing gradients and enable training of deep networks.
    Directly adopted from the original ResNet work without re-derivation.
  • standard math Stochastic gradient descent with momentum and standard learning rate decay trains the networks to convergence.
    Common optimization practice assumed to be effective across compared architectures.

pith-pipeline@v0.9.0 · 5482 in / 1282 out tokens · 52257 ms · 2026-05-13T01:20:31.195097+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 30 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Denoising Diffusion Probabilistic Models

    cs.LG 2020-06 accept novelty 8.0

    Denoising diffusion probabilistic models generate high-quality images by learning to reverse a fixed forward diffusion process, achieving FID 3.17 on CIFAR10.

  2. The Geometric Structure of Models Learning Sparse Data

    cs.LG 2026-05 unverdicted novelty 7.0

    In sparse regimes, models exploit normal alignment of Jacobians to minimize loss and maximize robustness; GrokAlign induces this alignment to accelerate training and RFAMs improve adversarial robustness.

  3. Low Rank Adaptation for Adversarial Perturbation

    cs.LG 2026-04 unverdicted novelty 7.0

    Adversarial perturbations possess an inherently low-rank structure that enables more efficient and effective black-box adversarial attacks via subspace projection.

  4. Concept Inconsistency in Dermoscopic Concept Bottleneck Models: A Rough-Set Analysis of the Derm7pt Dataset

    cs.LG 2026-04 conditional novelty 7.0

    Rough-set analysis finds 16.4% of 305 concept profiles in Derm7pt inconsistent (306 images), capping hard CBM accuracy at 92.1%; symmetric filtering produces a 705-image consistent benchmark where EfficientNet-B5 reac...

  5. Momentum Further Constrains Sharpness at the Edge of Stochastic Stability

    cs.LG 2026-04 unverdicted novelty 7.0

    Momentum SGD exhibits two distinct EoSS regimes for batch sharpness, stabilizing at 2(1-β)/η for small batches and 2(1+β)/η for large batches, aligning with linear stability thresholds.

  6. Learning Robustness at Test-Time from a Non-Robust Teacher

    cs.CV 2026-04 unverdicted novelty 7.0

    A test-time adaptation framework anchors adversarial training to a non-robust teacher's predictions, yielding more stable optimization and better robustness-accuracy trade-offs than standard self-consistency methods.

  7. Novel Anomaly Detection Scenarios and Evaluation Metrics to Address the Ambiguity in the Definition of Normal Samples

    cs.CV 2026-04 unverdicted novelty 7.0

    Introduces scenarios and metrics for ambiguous normal samples in anomaly detection plus RePaste method achieving SOTA on the new metric on MVTec AD while retaining high AUROC and PRO.

  8. LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

    cs.LG 2022-08 conditional novelty 7.0

    LLM.int8() performs 8-bit inference for transformers up to 175B parameters with no accuracy loss by combining vector-wise quantization for most features with 16-bit mixed-precision handling of systematic outlier dimensions.

  9. Video Diffusion Models

    cs.CV 2022-04 unverdicted novelty 7.0

    A diffusion model for video generation extends image architectures with joint image-video training and improved conditional sampling, delivering first large-scale text-to-video results and state-of-the-art performance...

  10. Venus-DeFakerOne: Unified Fake Image Detection & Localization

    cs.CV 2026-05 unverdicted novelty 6.0

    DeFakerOne integrates InternVL2 and SAM2 into a single model that achieves state-of-the-art results on 39 detection and 9 localization benchmarks for unified fake image detection and localization.

  11. FedVSSAM: Mitigating Flatness Incompatibility in Sharpness-Aware Federated Learning

    cs.LG 2026-05 unverdicted novelty 6.0

    FedVSSAM mitigates flatness incompatibility in SAM-based federated learning by consistently using a variance-suppressed adjusted direction for local perturbation, descent, and global updates, with non-convex convergen...

  12. Direct-to-Event Spiking Neural Network Transfer

    cs.NE 2026-05 unverdicted novelty 6.0

    This work provides the first systematic study of transferring direct-coded spiking neural networks to event-based representations while aiming to preserve accuracy and reduce energy use.

  13. Deep Wave Network for Modeling Multi-Scale Physical Dynamics

    cs.LG 2026-05 unverdicted novelty 6.0

    DW-Net improves the accuracy versus computational cost Pareto front over standard U-Nets for 2D and 3D multi-scale flow benchmarks by stacking multiple waves while keeping training settings identical.

  14. Detecting Adversarial Data via Provable Adversarial Noise Amplification

    cs.LG 2026-05 unverdicted novelty 6.0

    A provable adversarial noise amplification theorem under sufficient conditions enables a custom-trained detector that identifies adversarial examples at inference time using enhanced layer-wise noise signals.

  15. Learning to Reason: Targeted Knowledge Discovery and Fuzzy Logic Update for Robust Image Recognition

    cs.CV 2026-04 unverdicted novelty 6.0

    A differentiable fuzzy logic module called DKU discovers implicit concepts from image classification supervision and applies logical adjustments to improve class probabilities on PASCAL-VOC, COCO, and MedMNIST.

  16. FastAT Benchmark: A Comprehensive Framework for Fair Evaluation of Fast Adversarial Training Methods

    cs.CV 2026-04 conditional novelty 6.0

    The FastAT Benchmark standardizes evaluation of over twenty fast adversarial training methods under unified conditions, showing that well-designed single-step approaches can match or exceed PGD-AT robustness at lower ...

  17. Generative Cross-Entropy: A Strictly Proper Loss for Data-Efficient Classification

    cs.LG 2026-04 unverdicted novelty 6.0

    GenCE is a strictly proper loss obtained by normalizing each sample's softmax against the batch predictions, outperforming cross-entropy in low-data and imbalanced regimes with better calibration and OOD detection.

  18. StableTTA: Improving Vision Model Performance by Training-free Test-Time Adaptation Methods

    cs.CV 2026-04 unverdicted novelty 6.0

    StableTTA improves ImageNet-1K accuracy across 71 vision models by stabilizing logit aggregation under coherent-batch inference and enabling efficient single-forward-pass adaptation.

  19. Revisiting Feature Prediction for Learning Visual Representations from Video

    cs.CV 2024-02 conditional novelty 6.0

    V-JEPA models trained only on feature prediction from 2 million public videos achieve 81.9% on Kinetics-400, 72.2% on Something-Something-v2, and 77.9% on ImageNet-1K using frozen ViT-H/16 backbones.

  20. Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

    cs.LG 2024-01 unverdicted novelty 6.0

    SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on be...

  21. Rethinking Atrous Convolution for Semantic Image Segmentation

    cs.CV 2017-06 unverdicted novelty 6.0

    DeepLabv3 improves semantic segmentation by capturing multi-scale context with cascaded or parallel atrous convolutions and adding global context to ASPP, achieving better results on PASCAL VOC 2012 without DenseCRF p...

  22. SGDR: Stochastic Gradient Descent with Warm Restarts

    cs.LG 2016-08 accept novelty 6.0

    SGDR uses periodic warm restarts of the learning rate in SGD to reach new state-of-the-art error rates of 3.14% on CIFAR-10 and 16.21% on CIFAR-100.

  23. Taming the Long Tail: Rebalancing Adversarial Training via Adaptive Perturbation

    cs.LG 2026-05 unverdicted novelty 5.0

    RobustLT adaptively adjusts perturbations in adversarial training to simultaneously improve robustness and class balance on long-tailed datasets.

  24. A Composite Activation Function for Learning Stable Binary Representations

    cs.LG 2026-05 unverdicted novelty 5.0

    HTAF is a sigmoid-tanh composite that approximates the Heaviside function to allow stable gradient training of binary activation networks, yielding ICBMs with stable discretization and competitive performance on image tasks.

  25. Memory Efficient Full-gradient Attacks (MEFA) Framework for Adversarial Defense Evaluations

    cs.LG 2026-05 unverdicted novelty 5.0

    MEFA enables exact full-gradient white-box attacks on iterative stochastic purification defenses like diffusion and Langevin EBMs by trading recomputation for lower memory, revealing vulnerabilities missed by approxim...

  26. Generative Cross-Entropy: A Strictly Proper Loss for Data-Efficient Classification

    cs.LG 2026-04 unverdicted novelty 5.0

    Generative Cross-Entropy loss improves both accuracy and calibration over standard cross-entropy by augmenting it with a generative p(x|y) term, especially on long-tailed data, and pairs with adaptive temperature scal...

  27. Foundations of Reliable Inference: Reliability-Efficiency Co-Design

    cs.LG 2026-05 unverdicted novelty 4.0

    A unified framework is developed for co-designing reliability and efficiency to enable efficient reliable inference with trustworthy uncertainty quantification in AI models.

  28. JEPAMatch: Geometric Representation Shaping for Semi-Supervised Learning

    cs.LG 2026-04 unverdicted novelty 4.0

    JEPAMatch augments FlexMatch with LeJEPA-derived latent regularization to produce better-structured representations, yielding higher accuracy and faster convergence on CIFAR-100, STL-10, and Tiny-ImageNet.

  29. Image Classification via Random Dilated Convolution with Multi-Branch Feature Extraction and Context Excitation

    cs.CV 2026-04 unverdicted novelty 3.0

    RDCNet reports state-of-the-art accuracy on CIFAR-10, CIFAR-100, SVHN, Imagenette, and Imagewoof by combining random dilated convolutions with multi-branch and attention modules.

  30. A Transfer Learning Evaluation of Deep Neural Networks for Image Classification

    cs.CV 2026-05 unverdicted novelty 2.0

    Empirical comparison of transfer learning performance across eleven pre-trained models on five image datasets using accuracy, time, and size metrics.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · cited by 29 Pith papers · 2 internal anchors

  1. [1]

    Understanding the difficulty of training deep feed- forward neural networks

    Yoshua Bengio and Xavier Glorot. Understanding the difficulty of training deep feed- forward neural networks. In Proceedings of AISTATS 2010, volume 9, pages 249–256, May 2010

  2. [2]

    Scaling learning algorithms towards AI

    Yoshua Bengio and Yann LeCun. Scaling learning algorithms towards AI. In Léon Bottou, Olivier Chapelle, D. DeCoste, and J. Weston, editors, Large Scale Kernel Ma- chines. MIT Press, 2007

  3. [3]

    On the complexity of shallow and deep neu- ral network classifiers

    Monica Bianchini and Franco Scarselli. On the complexity of shallow and deep neu- ral network classifiers. In 22th European Symposium on Artificial Neural Networks, ESANN 2014, Bruges, Belgium, April 23-25, 2014, 2014

  4. [4]

    T. Chen, I. Goodfellow, and J. Shlens. Net2net: Accelerating learning via knowledge transfer. In International Conference on Learning Representation, 2016

  5. [5]

    Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)

    Djork-Arné Clevert, Thomas Unterthiner, and Sepp Hochreiter. Fast and accurate deep network learning by exponential linear units (elus). CoRR, abs/1511.07289, 2015

  6. [6]

    Collobert, K

    R. Collobert, K. Kavukcuoglu, and C. Farabet. Torch7: A matlab-like environment for machine learning. In BigLearn, NIPS Workshop, 2011

  7. [7]

    Locnet: Improving localization accuracy for object detection

    Spyros Gidaris and Nikos Komodakis. Locnet: Improving localization accuracy for object detection. In Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference on, 2016

  8. [8]

    Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, and Yoshua Bengio

    Ian J. Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, and Yoshua Bengio. Maxout networks. In Sanjoy Dasgupta and David McAllester, editors, Pro- ceedings of the 30th International Conference on Machine Learning (ICML’13), pages 1319–1327, 2013

  9. [9]

    Fractional max-pooling

    Benjamin Graham. Fractional max-pooling. arXiv:1412.6071, 2014

  10. [10]

    Training and investigating residual nets, 2016

    Sam Gross and Michael Wilber. Training and investigating residual nets, 2016. URL https://github.com/facebook/fb.resnet.torch

  11. [11]

    Deep Residual Learning for Image Recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015

  12. [12]

    Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. CoRR, abs/1502.01852, 2015. 14 SERGEY ZAGORUYKO AND NIKOS KOMODAKIS: WIDE RESIDUAL NETWORKS

  13. [13]

    Identity mappings in deep residual networks

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. CoRR, abs/1603.05027, 2016

  14. [14]

    Deep networks with stochastic depth

    Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, and Kilian Q. Weinberger. Deep networks with stochastic depth. CoRR, abs/1603.09382, 2016

  15. [15]

    Batch normalization: Accelerating deep network training by reducing internal covariate shift

    Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In David Blei and Francis Bach, editors, Proceedings of the 32nd International Conference on Machine Learning (ICML-15) , pages 448–456. JMLR Workshop and Conference Proceedings, 2015

  16. [16]

    Krizhevsky, I

    A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convo- lutional neural networks. In NIPS, 2012

  17. [17]

    Cifar-10 (canadian institute for advanced research)

    Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. Cifar-10 (canadian institute for advanced research). 2012. URL http://www.cs.toronto.edu/~kriz/ cifar.html

  18. [18]

    An empirical evaluation of deep architectures on problems with many factors of variation

    Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra, and Yoshua Ben- gio. An empirical evaluation of deep architectures on problems with many factors of variation. In Zoubin Ghahramani, editor, Proceedings of the 24th International Con- ference on Machine Learning (ICML’07), pages 473–480. ACM, 2007

  19. [19]

    C.-Y . Lee, S. Xie, P. Gallagher, Z. Zhang, and Z. Tu. Deeply-Supervised Nets. 2014

  20. [20]

    Network in network.CoRR, abs/1312.4400, 2013

    Min Lin, Qiang Chen, and Shuicheng Yan. Network in network.CoRR, abs/1312.4400, 2013

  21. [21]

    Optnet - reducing memory usage in torch neural networks, 2016

    Francisco Massa. Optnet - reducing memory usage in torch neural networks, 2016. URL https://github.com/fmassa/optimize-net

  22. [22]

    Montúfar, Razvan Pascanu, KyungHyun Cho, and Yoshua Bengio

    Guido F. Montúfar, Razvan Pascanu, KyungHyun Cho, and Yoshua Bengio. On the number of linear regions of deep neural networks. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pages 2924–2932, 2014

  23. [23]

    Deep learning made easier by linear transformations in perceptrons

    Tapani Raiko, Harri Valpola, and Yann Lecun. Deep learning made easier by linear transformations in perceptrons. In Neil D. Lawrence and Mark A. Girolami, editors, Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics (AISTATS-12), volume 22, pages 924–932, 2012

  24. [24]

    FitNets: Hints for Thin Deep Nets

    Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. FitNets: Hints for thin deep nets. Technical Report Arxiv report 1412.6550, arXiv, 2014

  25. [25]

    Schmidhuber

    J. Schmidhuber. Learning complex, extended sequences using the principle of history compression. Neural Computation, 4(2):234–242, 1992

  26. [26]

    Simonyan and A

    K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015

  27. [27]

    Srivastava, G

    N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting.JMLR, 2014. SERGEY ZAGORUYKO AND NIKOS KOMODAKIS: WIDE RESIDUAL NETWORKS 15

  28. [28]

    Highway Networks

    Rupesh Kumar Srivastava, Klaus Greff, and Jürgen Schmidhuber. Highway networks. CoRR, abs/1505.00387, 2015

  29. [29]

    Dahl, and Geoffrey E

    Ilya Sutskever, James Martens, George E. Dahl, and Geoffrey E. Hinton. On the im- portance of initialization and momentum in deep learning. In Sanjoy Dasgupta and David Mcallester, editors, Proceedings of the 30th International Conference on Ma- chine Learning (ICML-13), volume 28, pages 1139–1147. JMLR Workshop and Con- ference Proceedings, May 2013

  30. [30]

    Szegedy, W

    C. Szegedy, W. Liu, Y . Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V . Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In CVPR, 2015

  31. [31]

    Inception-v4, inception- resnet and the impact of residual connections on learning

    Christian Szegedy, Sergey Ioffe, and Vincent Vanhoucke. Inception-v4, inception- resnet and the impact of residual connections on learning. abs/1602.07261, 2016

  32. [32]

    Zagoruyko, A

    S. Zagoruyko, A. Lerer, T.-Y . Lin, P. O. Pinheiro, S. Gross, S. Chintala, and P. Dollár. A multipath network for object detection. In BMVC, 2016