Learning Data Augmentation Strategies for Object Detection

Barret Zoph; Ekin D. Cubuk; Golnaz Ghiasi; Jonathon Shlens; Quoc V. Le; Tsung-Yi Lin

arxiv: 1906.11172 · v1 · pith:LONYYBLWnew · submitted 2019-06-26 · 💻 cs.CV · cs.LG

Learning Data Augmentation Strategies for Object Detection

Barret Zoph , Ekin D. Cubuk , Golnaz Ghiasi , Tsung-Yi Lin , Jonathon Shlens , Quoc V. Le This is my paper

Pith reviewed 2026-05-25 15:34 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords data augmentationobject detectionCOCO datasetpolicy learningtransfer learningPASCAL-VOCimage classification

0 comments

The pith

A learned data augmentation policy improves object detection accuracy by more than 2.3 mAP on COCO and transfers unchanged to other datasets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper first shows that data augmentation operations taken from image classification give only modest help when training detection models. It then learns specialized augmentation policies through search, keeping the policy active only during training. On the COCO dataset these policies raise accuracy by more than 2.3 mAP and let one model reach 50.7 mAP. The same policy improves a strong baseline on PASCAL-VOC by 2.7 mAP without any changes. The learned policies also outperform architecture regularization methods for detection.

Core claim

Experiments on the COCO dataset indicate that an optimized data augmentation policy improves detection accuracy by more than +2.3 mAP, and allow a single inference model to achieve a state-of-the-art accuracy of 50.7 mAP. Importantly, the best policy found on COCO may be transferred unchanged to other detection datasets and models to improve predictive accuracy. For example, the best augmentation policy identified with COCO improves a strong baseline on PASCAL-VOC by +2.7 mAP. Our results also reveal that a learned augmentation policy is superior to state-of-the-art architecture regularization methods for object detection.

What carries the argument

A learned augmentation policy: a collection of image transformations and their parameters discovered by automated search to maximize detection performance on a given dataset.

If this is right

A single inference model reaches 50.7 mAP on COCO.
The identical policy raises accuracy by 2.7 mAP on PASCAL-VOC without retraining or retuning.
Learned policies outperform architecture regularization methods on detection tasks.
Policies discovered on one detection dataset improve performance on other detection datasets and models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The transfer result suggests that the policy captures dataset-agnostic invariances that could apply to additional detection benchmarks.
Because the policy changes only training data, it can be combined with any existing detection architecture without altering inference.
The approach may lower the annotation cost for new detection tasks by extracting more signal from existing labeled images.

Load-bearing premise

The reported accuracy gains are produced by the learned augmentation policy itself rather than by differences in training schedule, optimizer, or random seed.

What would settle it

Reproduce the COCO experiments while holding training schedule, optimizer, and random seed fixed and differ only in the use of the learned policy versus the baseline augmentations; the mAP gap must disappear if the policy is not the cause.

Figures

Figures reproduced from arXiv: 1906.11172 by Barret Zoph, Ekin D. Cubuk, Golnaz Ghiasi, Jonathon Shlens, Quoc V. Le, Tsung-Yi Lin.

**Figure 1.** Figure 1: Learned augmentation policy systematically improves object detection performance. Left: Learned augmentation policy applied to example from COCO dataset [25]. Right: Mean average precision for RetinaNet [24] with a ResNet-50 backbone on COCO [25] with and without learned augmentation policy (red and black, respectively). augmentation strategies [3, 42, 21]. In the image domain, common augmentations includ… view at source ↗

**Figure 2.** Figure 2: Examples of learned augmentation sub-policies. 5 examples of learned sub-policies applied to one example image. Each column corresponds to a different random sample of the corresponding sub-policy. Each step of an augmentation sub-policy consists of a triplet corresponding to the operation, the probability of application and a magnitude measure. The bounding box is adjusted to maintain consistency with the… view at source ↗

**Figure 3.** Figure 3: Percentage improvement in mAP for objects of [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Percentage improvement due to the learned aug [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Training loss vs. number of training examples for [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

read the original abstract

Data augmentation is a critical component of training deep learning models. Although data augmentation has been shown to significantly improve image classification, its potential has not been thoroughly investigated for object detection. Given the additional cost for annotating images for object detection, data augmentation may be of even greater importance for this computer vision task. In this work, we study the impact of data augmentation on object detection. We first demonstrate that data augmentation operations borrowed from image classification may be helpful for training detection models, but the improvement is limited. Thus, we investigate how learned, specialized data augmentation policies improve generalization performance for detection models. Importantly, these augmentation policies only affect training and leave a trained model unchanged during evaluation. Experiments on the COCO dataset indicate that an optimized data augmentation policy improves detection accuracy by more than +2.3 mAP, and allow a single inference model to achieve a state-of-the-art accuracy of 50.7 mAP. Importantly, the best policy found on COCO may be transferred unchanged to other detection datasets and models to improve predictive accuracy. For example, the best augmentation policy identified with COCO improves a strong baseline on PASCAL-VOC by +2.7 mAP. Our results also reveal that a learned augmentation policy is superior to state-of-the-art architecture regularization methods for object detection, even when considering strong baselines. Code for training with the learned policy is available online at https://github.com/tensorflow/tpu/tree/master/models/official/detection

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper studies data augmentation for object detection, first showing that classification-derived operations give limited gains, then proposing to learn specialized augmentation policies via search. On COCO this yields >+2.3 mAP (reaching 50.7 mAP SOTA with one model); the best COCO policy transfers unchanged to improve a strong PASCAL-VOC baseline by +2.7 mAP and outperforms architecture regularization methods.

Significance. If the reported mAP gains are shown to be caused by the learned policy rather than differences in training schedule, optimizer, or seeds, the work would be significant: it supplies the first demonstration of transferable, detection-specific augmentation policies and supplies reproducible code, strengthening the case that learned augmentation is a high-leverage, low-cost lever for detection.

major comments (2)

[Abstract / Experimental results] The abstract and experimental description supply no explicit statement that the reported baseline and policy-augmented runs share identical optimizer, learning-rate schedule, number of epochs, data-loader settings, and random seeds. Without this control the +2.3 mAP delta cannot be isolated to the augmentation policy.
[Methods / Policy search] No information is given on the search procedure itself (search space size, number of trials, validation-set usage during policy search, or whether the final reported numbers use the same validation split). This information is load-bearing for assessing whether the headline gains are statistically reliable.

minor comments (1)

[Abstract] The GitHub link is provided but the manuscript does not state the exact commit or configuration files used to reproduce the 50.7 mAP result.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments highlight important aspects of experimental clarity and reproducibility. We respond to each major comment below and will revise the manuscript to incorporate clarifications.

read point-by-point responses

Referee: [Abstract / Experimental results] The abstract and experimental description supply no explicit statement that the reported baseline and policy-augmented runs share identical optimizer, learning-rate schedule, number of epochs, data-loader settings, and random seeds. Without this control the +2.3 mAP delta cannot be isolated to the augmentation policy.

Authors: We agree that an explicit statement would improve clarity. All experiments in the paper use identical training configurations (optimizer, learning-rate schedule, number of epochs, data-loader settings, and random seeds), with the only difference being the data augmentation policy. This is reflected in the released code at the provided GitHub link. We will add an explicit statement to the abstract and experimental results section confirming that all other factors are held constant across baseline and policy-augmented runs. revision: yes
Referee: [Methods / Policy search] No information is given on the search procedure itself (search space size, number of trials, validation-set usage during policy search, or whether the final reported numbers use the same validation split). This information is load-bearing for assessing whether the headline gains are statistically reliable.

Authors: We acknowledge that the methods section would benefit from expanded details on the search procedure. The policy search is conducted on a held-out validation split distinct from the test set used for final COCO and transfer results. We will revise the methods section to specify the search space size, number of trials performed, and confirmation that the validation split for search is separate from the evaluation test split. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical gains validated on held-out COCO test set and transferred to independent VOC dataset

full rationale

The paper reports empirical improvements from searched augmentation policies, measured on held-out COCO test data (+2.3 mAP) and transferred unchanged to PASCAL-VOC (+2.7 mAP). No derivation chain, equations, or self-citations reduce any claimed result to a fitted parameter or input by construction. The central claims rest on standard train/test splits and cross-dataset transfer rather than self-definition or load-bearing self-citation.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The paper introduces no new mathematical entities or axioms. It relies on the standard supervised-learning assumption that training and test distributions are similar enough for augmentation policies discovered on one to remain useful on the other, and on the empirical effectiveness of the (unspecified) policy-search algorithm.

free parameters (1)

augmentation policy
The policy itself is the output of a search procedure run on COCO; its concrete parameters are therefore fitted to that dataset.

axioms (1)

domain assumption Standard i.i.d. supervised learning assumptions hold for the COCO and PASCAL-VOC image distributions.
All reported mAP numbers presuppose that the train and test splits are representative samples from the same underlying distribution.

pith-pipeline@v0.9.0 · 5814 in / 1427 out tokens · 41074 ms · 2026-05-25T15:34:36.524221+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We treat data augmentation search as a discrete optimization problem... RNN controller... PPO... 22 operations... K=5 sub-policies, N=2 operations...
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Experiments on the COCO dataset indicate that an optimized data augmentation policy improves detection accuracy by more than +2.3 mAP

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · 19 internal anchors

[1]

Abadi, P

M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Is- ard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker, V . Vasudevan, P. Warden, M. Wicke, Y . Yu, and X. Zheng. Tensor- ﬂow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Conference on Oper- ati...

work page 2016
[2]

Data Augmentation Generative Adversarial Networks

A. Antoniou, A. Storkey, and H. Edwards. Data augmentation generative adversarial networks. arXiv preprint arXiv:1711.04340, 2017. 1

work page internal anchor Pith review Pith/arXiv arXiv 2017
[3]

H. S. Baird. Document image defect models. In Structured Document Image Analysis, pages 546–556. Springer, 1992. 1

work page 1992
[4]

Ciregan, U

D. Ciregan, U. Meier, and J. Schmidhuber. Multi- column deep neural networks for image classiﬁca- tion. In Proceedings of IEEE Conference on Com- puter Vision and Pattern Recognition , pages 3642–

work page
[5]

E. D. Cubuk, B. Zoph, D. Mane, V . Vasudevan, and Q. V . Le. Autoaugment: Learning augmentation poli- cies from data. arXiv preprint arXiv:1805.09501 ,

work page internal anchor Pith review Pith/arXiv arXiv
[6]

E. D. Cubuk, B. Zoph, S. S. Schoenholz, and Q. V . Le. Intriguing properties of adversarial examples. arXiv preprint arXiv:1711.02846, 2017. 4

work page internal anchor Pith review Pith/arXiv arXiv 2017
[7]

J. Dai, K. He, and J. Sun. Instance-aware semantic segmentation via multi-task network cascades. InPro- ceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3150–3158, 2016. 8

work page 2016
[8]

Dataset Augmentation in Feature Space

T. DeVries and G. W. Taylor. Dataset augmentation in feature space. arXiv preprint arXiv:1702.05538, 2017. 1, 2

work page internal anchor Pith review Pith/arXiv arXiv 2017
[9]

Improved Regularization of Convolutional Neural Networks with Cutout

T. DeVries and G. W. Taylor. Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552, 2017. 2, 13

work page internal anchor Pith review Pith/arXiv arXiv 2017
[10]

Dwibedi, I

D. Dwibedi, I. Misra, and M. Hebert. Cut, paste and learn: Surprisingly easy synthesis for instance detec- tion. In Proceedings of the IEEE International Con- ference on Computer Vision, pages 1301–1310, 2017. 2

work page 2017
[11]

Everingham, L

M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge. International journal of computer vi- sion, 88(2):303–338, 2010. 6, 7

work page 2010
[12]

N. Ford, J. Gilmer, N. Carlini, and D. Cubuk. Adver- sarial examples are a natural consequence of test error in noise. arXiv preprint arXiv:1901.10513, 2019. 2

work page internal anchor Pith review Pith/arXiv arXiv 1901
[13]

Ghiasi, T.-Y

G. Ghiasi, T.-Y . Lin, and Q. V . Le. DropBlock: A regularization method for convolutional networks. In Advances in Neural Information Processing Systems , pages 10750–10760, 2018. 2, 5, 7

work page 2018
[14]

Ghiasi, T.-Y

G. Ghiasi, T.-Y . Lin, R. Pang, and Q. V . Le. NAS-FPN: Learning scalable feature pyramid architecture for ob- ject detection. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019. 5, 7

work page 2019
[15]

Girshick, I

R. Girshick, I. Radosavovic, G. Gkioxari, P. Doll ´ar, and K. He. Detectron, 2018. 2

work page 2018
[16]

K. He, X. Zhang, S. Ren, and J. Sun. Deep resid- ual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016. 1, 4

work page 2016
[17]

D. Ho, E. Liang, I. Stoica, P. Abbeel, and X. Chen. Population based augmentation: Efﬁcient learning of augmentation policy schedules. arXiv preprint arXiv:1905.05393, 2019. 2, 4

work page internal anchor Pith review Pith/arXiv arXiv 1905
[18]

J. Hu, L. Shen, and G. Sun. Squeeze-and-excitation networks. arXiv preprint arXiv:1709.01507, 2017. 1

work page internal anchor Pith review Pith/arXiv arXiv 2017
[19]

Huang, V

J. Huang, V . Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer, Z. Wojna, Y . Song, S. Guadar- rama, et al. Speed/accuracy trade-offs for modern convolutional object detectors. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7310–7311, 2017. 6

work page 2017
[20]

N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, et al. In-datacenter performance analysis of a tensor processing unit. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Archi- tecture (ISCA), pages 1–12. IEEE, 2017. 4, 5

work page 2017
[21]

Krizhevsky, I

A. Krizhevsky, I. Sutskever, and G. E. Hinton. Im- agenet classiﬁcation with deep convolutional neural networks. In Advances in Neural Information Process- ing Systems, 2012. 1, 2

work page 2012
[22]

Lemley, S

J. Lemley, S. Bazrafkan, and P. Corcoran. Smart augmentation learning an optimal data augmentation strategy. IEEE Access, 5:5858–5869, 2017. 1, 2

work page 2017
[23]

S. Lim, I. Kim, T. Kim, C. Kim, and S. Kim. Fast autoaugment. arXiv preprint arXiv:1905.00397, 2019. 2, 4

work page arXiv 1905
[24]

T.-Y . Lin, P. Goyal, R. Girshick, K. He, and P. Doll´ar. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vi- sion, pages 2980–2988, 2017. 1, 4, 5

work page 2017
[25]

T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll ´ar, and C. L. Zitnick. Microsoft coco: Common objects in context. In European con- ference on computer vision, pages 740–755. Springer,

work page
[26]

C. Liu, B. Zoph, J. Shlens, W. Hua, L.-J. Li, L. Fei- Fei, A. Yuille, J. Huang, and K. Murphy. Pro- gressive neural architecture search. arXiv preprint arXiv:1712.00559, 2017. 4

work page internal anchor Pith review Pith/arXiv arXiv 2017
[27]

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y . Fu, and A. C. Berg. Ssd: Single shot multibox detector. In European conference on computer vision, pages 21–37. Springer, 2016. 2

work page 2016
[28]

J. Long, E. Shelhamer, and T. Darrell. Fully convo- lutional networks for semantic segmentation. In Pro- ceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015. 8

work page 2015
[29]

R. G. Lopes, D. Yin, B. Poole, J. Gilmer, and E. D. Cubuk. Improving robustness without sacriﬁcing accuracy with patch gaussian augmentation. arXiv preprint arXiv:1906.02611, 2019. 2

work page internal anchor Pith review Pith/arXiv arXiv 1906
[30]

SGDR: Stochastic Gradient Descent with Warm Restarts

I. Loshchilov and F. Hutter. SGDR: Stochastic gra- dient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016. 4

work page internal anchor Pith review Pith/arXiv arXiv 2016
[31]

S. Mun, S. Park, D. K. Han, and H. Ko. Generative adversarial network based acoustic scene training set augmentation and selection using svm hyper-plane. In Detection and Classiﬁcation of Acoustic Scenes and Events Workshop, 2017. 1

work page 2017
[32]

C. Peng, T. Xiao, Z. Li, Y . Jiang, X. Zhang, K. Jia, G. Yu, and J. Sun. Megdet: A large mini-batch object detector. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018. 6, 7

work page 2018
[33]

The Effectiveness of Data Augmentation in Image Classification using Deep Learning

L. Perez and J. Wang. The effectiveness of data aug- mentation in image classiﬁcation using deep learning. arXiv preprint arXiv:1712.04621, 2017. 1

work page internal anchor Pith review Pith/arXiv arXiv 2017
[34]

P. O. Pinheiro, T.-Y . Lin, R. Collobert, and P. Doll ´ar. Learning to reﬁne object segments. InEuropean Con- ference on Computer Vision , pages 75–91. Springer,

work page
[35]

C. R. Qi, H. Su, K. Mo, and L. J. Guibas. Pointnet: Deep learning on point sets for 3d classiﬁcation and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 652–660, 2017. 8

work page 2017
[36]

C. R. Qi, L. Yi, H. Su, and L. J. Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in Neural Information Pro- cessing Systems, pages 5099–5108, 2017. 8

work page 2017
[37]

A. J. Ratner, H. Ehrenberg, Z. Hussain, J. Dunnmon, and C. R ´e. Learning to compose domain-speciﬁc transformations for data augmentation. InAdvances in Neural Information Processing Systems, pages 3239– 3249, 2017. 1, 2

work page 2017
[38]

E. Real, A. Aggarwal, Y . Huang, and Q. V . Le. Reg- ularized evolution for image classiﬁer architecture search. In Thirty-Third AAAI Conference on Artiﬁcial Intelligence, 2019. 4, 5, 7

work page 2019
[39]

S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region pro- posal networks. In Advances in neural information processing systems, pages 91–99, 2015. 6, 7

work page 2015
[40]

I. Sato, H. Nishimura, and K. Yokoi. Apac: Aug- mented pattern classiﬁcation with neural networks. arXiv preprint arXiv:1505.03229, 2015. 2

work page internal anchor Pith review Pith/arXiv arXiv 2015
[41]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017. 4

work page internal anchor Pith review Pith/arXiv arXiv 2017
[42]

P. Y . Simard, D. Steinkraus, J. C. Platt, et al. Best practices for convolutional neural networks applied to visual document analysis. In Proceedings of Interna- tional Conference on Document Analysis and Recog- nition, 2003. 1, 2

work page 2003
[43]

L. Sixt, B. Wild, and T. Landgraf. Rendergan: Generating realistic labeled data. arXiv preprint arXiv:1611.01331, 2016. 1

work page internal anchor Pith review Pith/arXiv arXiv 2016
[44]

Szegedy, W

C. Szegedy, W. Liu, Y . Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V . Vanhoucke, A. Rabinovich, et al. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pat- tern Recognition (CVPR), 2015. 1

work page 2015
[45]

T. Tran, T. Pham, G. Carneiro, L. Palmer, and I. Reid. A bayesian data augmentation approach for learning deep models. In Advances in Neural Information Pro- cessing Systems, pages 2794–2803, 2017. 1, 2

work page 2017
[46]

Manifold Mixup: Better Representations by Interpolating Hidden States

V . Verma, A. Lamb, C. Beckham, A. Courville, I. Mitliagkis, and Y . Bengio. Manifold mixup: En- couraging meaningful on-manifold interpolation as a regularizer. arXiv preprint arXiv:1806.05236, 2018. 7

work page internal anchor Pith review Pith/arXiv arXiv 2018
[47]

L. Wan, M. Zeiler, S. Zhang, Y . Le Cun, and R. Fer- gus. Regularization of neural networks using dropcon- nect. In International Conference on Machine Learn- ing, pages 1058–1066, 2013. 2

work page 2013
[48]

X. Wang, A. Shrivastava, and A. Gupta. A-fast- rcnn: Hard positive generation via adversary for ob- ject detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 2606–2615, 2017. 2

work page 2017
[49]

T. Yang, X. Zhang, Z. Li, W. Zhang, and J. Sun. Metaanchor: Learning to detect objects with cus- tomized anchors. In Advances in Neural Information Processing Systems, pages 318–328, 2018. 6

work page 2018
[50]

D. Yin, R. G. Lopes, J. Shlens, E. D. Cubuk, and J. Gilmer. A fourier perspective on model robustness in computer vision. arXiv preprint arXiv:1906.08988,

work page arXiv 1906
[51]

Zagoruyko and N

S. Zagoruyko and N. Komodakis. Wide residual net- works. In British Machine Vision Conference, 2016. 2

work page 2016
[52]

mixup: Beyond Empirical Risk Minimization

H. Zhang, M. Cisse, Y . N. Dauphin, and D. Lopez-Paz. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412, 2017. 7

work page internal anchor Pith review Pith/arXiv arXiv 2017
[53]

Random Erasing Data Augmentation

Z. Zhong, L. Zheng, G. Kang, S. Li, and Y . Yang. Random erasing data augmentation. arXiv preprint arXiv:1708.04896, 2017. 2, 13

work page internal anchor Pith review Pith/arXiv arXiv 2017
[54]

X. Zhu, Y . Liu, Z. Qin, and J. Li. Data augmentation in emotion classiﬁcation using generative adversarial networks. arXiv preprint arXiv:1711.00648, 2017. 1

work page internal anchor Pith review Pith/arXiv arXiv 2017
[55]

Zoph and Q

B. Zoph and Q. V . Le. Neural architecture search with reinforcement learning. In International Conference on Learning Representations, 2017. 4

work page 2017
[56]

B. Zoph, V . Vasudevan, J. Shlens, and Q. V . Le. Learning transferable architectures for scalable image recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. 1, 4 A. Appendix Operation Name Description Range of magnitudes ShearX(Y) Shear the image and the corners of the bounding boxes along the horizontal (verti...

work page 2017

[1] [1]

Abadi, P

M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Is- ard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker, V . Vasudevan, P. Warden, M. Wicke, Y . Yu, and X. Zheng. Tensor- ﬂow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Conference on Oper- ati...

work page 2016

[2] [2]

Data Augmentation Generative Adversarial Networks

A. Antoniou, A. Storkey, and H. Edwards. Data augmentation generative adversarial networks. arXiv preprint arXiv:1711.04340, 2017. 1

work page internal anchor Pith review Pith/arXiv arXiv 2017

[3] [3]

H. S. Baird. Document image defect models. In Structured Document Image Analysis, pages 546–556. Springer, 1992. 1

work page 1992

[4] [4]

Ciregan, U

D. Ciregan, U. Meier, and J. Schmidhuber. Multi- column deep neural networks for image classiﬁca- tion. In Proceedings of IEEE Conference on Com- puter Vision and Pattern Recognition , pages 3642–

work page

[5] [5]

E. D. Cubuk, B. Zoph, D. Mane, V . Vasudevan, and Q. V . Le. Autoaugment: Learning augmentation poli- cies from data. arXiv preprint arXiv:1805.09501 ,

work page internal anchor Pith review Pith/arXiv arXiv

[6] [6]

E. D. Cubuk, B. Zoph, S. S. Schoenholz, and Q. V . Le. Intriguing properties of adversarial examples. arXiv preprint arXiv:1711.02846, 2017. 4

work page internal anchor Pith review Pith/arXiv arXiv 2017

[7] [7]

J. Dai, K. He, and J. Sun. Instance-aware semantic segmentation via multi-task network cascades. InPro- ceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3150–3158, 2016. 8

work page 2016

[8] [8]

Dataset Augmentation in Feature Space

T. DeVries and G. W. Taylor. Dataset augmentation in feature space. arXiv preprint arXiv:1702.05538, 2017. 1, 2

work page internal anchor Pith review Pith/arXiv arXiv 2017

[9] [9]

Improved Regularization of Convolutional Neural Networks with Cutout

T. DeVries and G. W. Taylor. Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552, 2017. 2, 13

work page internal anchor Pith review Pith/arXiv arXiv 2017

[10] [10]

Dwibedi, I

D. Dwibedi, I. Misra, and M. Hebert. Cut, paste and learn: Surprisingly easy synthesis for instance detec- tion. In Proceedings of the IEEE International Con- ference on Computer Vision, pages 1301–1310, 2017. 2

work page 2017

[11] [11]

Everingham, L

M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge. International journal of computer vi- sion, 88(2):303–338, 2010. 6, 7

work page 2010

[12] [12]

N. Ford, J. Gilmer, N. Carlini, and D. Cubuk. Adver- sarial examples are a natural consequence of test error in noise. arXiv preprint arXiv:1901.10513, 2019. 2

work page internal anchor Pith review Pith/arXiv arXiv 1901

[13] [13]

Ghiasi, T.-Y

G. Ghiasi, T.-Y . Lin, and Q. V . Le. DropBlock: A regularization method for convolutional networks. In Advances in Neural Information Processing Systems , pages 10750–10760, 2018. 2, 5, 7

work page 2018

[14] [14]

Ghiasi, T.-Y

G. Ghiasi, T.-Y . Lin, R. Pang, and Q. V . Le. NAS-FPN: Learning scalable feature pyramid architecture for ob- ject detection. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019. 5, 7

work page 2019

[15] [15]

Girshick, I

R. Girshick, I. Radosavovic, G. Gkioxari, P. Doll ´ar, and K. He. Detectron, 2018. 2

work page 2018

[16] [16]

K. He, X. Zhang, S. Ren, and J. Sun. Deep resid- ual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016. 1, 4

work page 2016

[17] [17]

D. Ho, E. Liang, I. Stoica, P. Abbeel, and X. Chen. Population based augmentation: Efﬁcient learning of augmentation policy schedules. arXiv preprint arXiv:1905.05393, 2019. 2, 4

work page internal anchor Pith review Pith/arXiv arXiv 1905

[18] [18]

J. Hu, L. Shen, and G. Sun. Squeeze-and-excitation networks. arXiv preprint arXiv:1709.01507, 2017. 1

work page internal anchor Pith review Pith/arXiv arXiv 2017

[19] [19]

Huang, V

J. Huang, V . Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer, Z. Wojna, Y . Song, S. Guadar- rama, et al. Speed/accuracy trade-offs for modern convolutional object detectors. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7310–7311, 2017. 6

work page 2017

[20] [20]

N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, et al. In-datacenter performance analysis of a tensor processing unit. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Archi- tecture (ISCA), pages 1–12. IEEE, 2017. 4, 5

work page 2017

[21] [21]

Krizhevsky, I

A. Krizhevsky, I. Sutskever, and G. E. Hinton. Im- agenet classiﬁcation with deep convolutional neural networks. In Advances in Neural Information Process- ing Systems, 2012. 1, 2

work page 2012

[22] [22]

Lemley, S

J. Lemley, S. Bazrafkan, and P. Corcoran. Smart augmentation learning an optimal data augmentation strategy. IEEE Access, 5:5858–5869, 2017. 1, 2

work page 2017

[23] [23]

S. Lim, I. Kim, T. Kim, C. Kim, and S. Kim. Fast autoaugment. arXiv preprint arXiv:1905.00397, 2019. 2, 4

work page arXiv 1905

[24] [24]

T.-Y . Lin, P. Goyal, R. Girshick, K. He, and P. Doll´ar. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vi- sion, pages 2980–2988, 2017. 1, 4, 5

work page 2017

[25] [25]

T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll ´ar, and C. L. Zitnick. Microsoft coco: Common objects in context. In European con- ference on computer vision, pages 740–755. Springer,

work page

[26] [26]

C. Liu, B. Zoph, J. Shlens, W. Hua, L.-J. Li, L. Fei- Fei, A. Yuille, J. Huang, and K. Murphy. Pro- gressive neural architecture search. arXiv preprint arXiv:1712.00559, 2017. 4

work page internal anchor Pith review Pith/arXiv arXiv 2017

[27] [27]

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y . Fu, and A. C. Berg. Ssd: Single shot multibox detector. In European conference on computer vision, pages 21–37. Springer, 2016. 2

work page 2016

[28] [28]

J. Long, E. Shelhamer, and T. Darrell. Fully convo- lutional networks for semantic segmentation. In Pro- ceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015. 8

work page 2015

[29] [29]

R. G. Lopes, D. Yin, B. Poole, J. Gilmer, and E. D. Cubuk. Improving robustness without sacriﬁcing accuracy with patch gaussian augmentation. arXiv preprint arXiv:1906.02611, 2019. 2

work page internal anchor Pith review Pith/arXiv arXiv 1906

[30] [30]

SGDR: Stochastic Gradient Descent with Warm Restarts

I. Loshchilov and F. Hutter. SGDR: Stochastic gra- dient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016. 4

work page internal anchor Pith review Pith/arXiv arXiv 2016

[31] [31]

S. Mun, S. Park, D. K. Han, and H. Ko. Generative adversarial network based acoustic scene training set augmentation and selection using svm hyper-plane. In Detection and Classiﬁcation of Acoustic Scenes and Events Workshop, 2017. 1

work page 2017

[32] [32]

C. Peng, T. Xiao, Z. Li, Y . Jiang, X. Zhang, K. Jia, G. Yu, and J. Sun. Megdet: A large mini-batch object detector. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018. 6, 7

work page 2018

[33] [33]

The Effectiveness of Data Augmentation in Image Classification using Deep Learning

L. Perez and J. Wang. The effectiveness of data aug- mentation in image classiﬁcation using deep learning. arXiv preprint arXiv:1712.04621, 2017. 1

work page internal anchor Pith review Pith/arXiv arXiv 2017

[34] [34]

P. O. Pinheiro, T.-Y . Lin, R. Collobert, and P. Doll ´ar. Learning to reﬁne object segments. InEuropean Con- ference on Computer Vision , pages 75–91. Springer,

work page

[35] [35]

C. R. Qi, H. Su, K. Mo, and L. J. Guibas. Pointnet: Deep learning on point sets for 3d classiﬁcation and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 652–660, 2017. 8

work page 2017

[36] [36]

C. R. Qi, L. Yi, H. Su, and L. J. Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in Neural Information Pro- cessing Systems, pages 5099–5108, 2017. 8

work page 2017

[37] [37]

A. J. Ratner, H. Ehrenberg, Z. Hussain, J. Dunnmon, and C. R ´e. Learning to compose domain-speciﬁc transformations for data augmentation. InAdvances in Neural Information Processing Systems, pages 3239– 3249, 2017. 1, 2

work page 2017

[38] [38]

E. Real, A. Aggarwal, Y . Huang, and Q. V . Le. Reg- ularized evolution for image classiﬁer architecture search. In Thirty-Third AAAI Conference on Artiﬁcial Intelligence, 2019. 4, 5, 7

work page 2019

[39] [39]

S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region pro- posal networks. In Advances in neural information processing systems, pages 91–99, 2015. 6, 7

work page 2015

[40] [40]

I. Sato, H. Nishimura, and K. Yokoi. Apac: Aug- mented pattern classiﬁcation with neural networks. arXiv preprint arXiv:1505.03229, 2015. 2

work page internal anchor Pith review Pith/arXiv arXiv 2015

[41] [41]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017. 4

work page internal anchor Pith review Pith/arXiv arXiv 2017

[42] [42]

P. Y . Simard, D. Steinkraus, J. C. Platt, et al. Best practices for convolutional neural networks applied to visual document analysis. In Proceedings of Interna- tional Conference on Document Analysis and Recog- nition, 2003. 1, 2

work page 2003

[43] [43]

L. Sixt, B. Wild, and T. Landgraf. Rendergan: Generating realistic labeled data. arXiv preprint arXiv:1611.01331, 2016. 1

work page internal anchor Pith review Pith/arXiv arXiv 2016

[44] [44]

Szegedy, W

C. Szegedy, W. Liu, Y . Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V . Vanhoucke, A. Rabinovich, et al. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pat- tern Recognition (CVPR), 2015. 1

work page 2015

[45] [45]

T. Tran, T. Pham, G. Carneiro, L. Palmer, and I. Reid. A bayesian data augmentation approach for learning deep models. In Advances in Neural Information Pro- cessing Systems, pages 2794–2803, 2017. 1, 2

work page 2017

[46] [46]

Manifold Mixup: Better Representations by Interpolating Hidden States

V . Verma, A. Lamb, C. Beckham, A. Courville, I. Mitliagkis, and Y . Bengio. Manifold mixup: En- couraging meaningful on-manifold interpolation as a regularizer. arXiv preprint arXiv:1806.05236, 2018. 7

work page internal anchor Pith review Pith/arXiv arXiv 2018

[47] [47]

L. Wan, M. Zeiler, S. Zhang, Y . Le Cun, and R. Fer- gus. Regularization of neural networks using dropcon- nect. In International Conference on Machine Learn- ing, pages 1058–1066, 2013. 2

work page 2013

[48] [48]

X. Wang, A. Shrivastava, and A. Gupta. A-fast- rcnn: Hard positive generation via adversary for ob- ject detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 2606–2615, 2017. 2

work page 2017

[49] [49]

T. Yang, X. Zhang, Z. Li, W. Zhang, and J. Sun. Metaanchor: Learning to detect objects with cus- tomized anchors. In Advances in Neural Information Processing Systems, pages 318–328, 2018. 6

work page 2018

[50] [50]

D. Yin, R. G. Lopes, J. Shlens, E. D. Cubuk, and J. Gilmer. A fourier perspective on model robustness in computer vision. arXiv preprint arXiv:1906.08988,

work page arXiv 1906

[51] [51]

Zagoruyko and N

S. Zagoruyko and N. Komodakis. Wide residual net- works. In British Machine Vision Conference, 2016. 2

work page 2016

[52] [52]

mixup: Beyond Empirical Risk Minimization

H. Zhang, M. Cisse, Y . N. Dauphin, and D. Lopez-Paz. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412, 2017. 7

work page internal anchor Pith review Pith/arXiv arXiv 2017

[53] [53]

Random Erasing Data Augmentation

Z. Zhong, L. Zheng, G. Kang, S. Li, and Y . Yang. Random erasing data augmentation. arXiv preprint arXiv:1708.04896, 2017. 2, 13

work page internal anchor Pith review Pith/arXiv arXiv 2017

[54] [54]

X. Zhu, Y . Liu, Z. Qin, and J. Li. Data augmentation in emotion classiﬁcation using generative adversarial networks. arXiv preprint arXiv:1711.00648, 2017. 1

work page internal anchor Pith review Pith/arXiv arXiv 2017

[55] [55]

Zoph and Q

B. Zoph and Q. V . Le. Neural architecture search with reinforcement learning. In International Conference on Learning Representations, 2017. 4

work page 2017

[56] [56]

B. Zoph, V . Vasudevan, J. Shlens, and Q. V . Le. Learning transferable architectures for scalable image recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. 1, 4 A. Appendix Operation Name Description Range of magnitudes ShearX(Y) Shear the image and the corners of the bounding boxes along the horizontal (verti...

work page 2017