Towards Adversarially Robust Object Detection

Haichao Zhang; Jianyu Wang

arxiv: 1907.10310 · v1 · pith:WM32GHBDnew · submitted 2019-07-24 · 💻 cs.CV · cs.LG· eess.IV

Towards Adversarially Robust Object Detection

Haichao Zhang , Jianyu Wang This is my paper

Pith reviewed 2026-05-24 17:01 UTC · model grok-4.3

classification 💻 cs.CV cs.LGeess.IV

keywords adversarial robustnessobject detectionadversarial trainingmulti-task learningrobustnesscomputer visionPASCAL-VOCMS-COCO

0 comments

The pith

An adversarial training approach leveraging multiple attack sources improves object detector robustness by exploiting asymmetric task losses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Object detection models are vulnerable to adversarial attacks, and this paper takes an initial step to improve their robustness. It revisits detectors and attacks from a robustness perspective and frames object detection as a multi-task learning problem. This reveals an asymmetric role for the different task losses, which the authors use to design a new adversarial training method. The method incorporates attacks from multiple sources rather than a single one. Experiments on PASCAL-VOC and MS-COCO verify that the approach enhances robustness.

Core claim

By presenting a multi-task learning perspective of object detection and identifying the asymmetric role of task losses, the authors develop an adversarial training approach which can leverage the multiple sources of attacks for improving the robustness of detection models, with effectiveness verified through extensive experiments on PASCAL-VOC and MS-COCO.

What carries the argument

Multi-task learning perspective of object detection identifying the asymmetric role of task losses, which guides the design of a multi-source adversarial training method.

If this is right

The adversarial training method can improve robustness of detection models against attacks.
Multiple sources of attacks can be leveraged during training instead of relying on one.
The approach applies to existing detection models as demonstrated on PASCAL-VOC and MS-COCO.
Robustness becomes a more achievable performance factor for practical vision systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The loss asymmetry insight could apply to designing defenses in other multi-task vision models.
Attack generation strategies might need to account for how different losses respond asymmetrically.
Combining this training method with architectural changes could further enhance robustness.

Load-bearing premise

The identified asymmetric role of task losses in the multi-task learning perspective provides a valid basis for designing an effective adversarial training method.

What would settle it

If the proposed adversarial training using multiple attack sources shows no improvement in robustness metrics over standard single-source adversarial training on PASCAL-VOC or MS-COCO, the central claim would be falsified.

Figures

Figures reproduced from arXiv: 1907.10310 by Haichao Zhang, Jianyu Wang.

**Figure 1.** Figure 1: Standard v.s. robust detectors on clean and adversarial images. The adversarial image is produced using PDG-based detector attacks [23, 33] with perturbation budget 8 (out of 256). The standard model [29] fails completely on the adversarial image while the robust model can produce reasonable detection results. ered by deep nets have emerged as an indispensable component in many vision systems of real-wor… view at source ↗

**Figure 2.** Figure 2: One-stage detector architecture. A base-net (w. para. θb) is shared by classification (w. para. θc) and localization (w. para. θl) tasks. θ = [θb, θc, θl] denotes the full parameters for the detector. For training, the NMS module is removed and task losses are appended for classification and localization respectively. 3.1. Object Detection as Multi-Task Learning An object detector f(x) → {pk, bk} K k=1 tak… view at source ↗

**Figure 3.** Figure 3: Mutual impacts of task losses and gradient visualization. (a) Model performance on classification and localization under different attacks: clean image, losscls-based attack and losslocbased attack. The model is a standard detector trained on clean images. The performance metric is detailed in text. (b) Scatter plot of task gradients for classification gc and localization gl. sidering classification, we… view at source ↗

**Figure 4.** Figure 4: Visualization of task domains Scls and Sloc using t-SNE. Given a single clean image x, each dot in the picture represents one adversarial example generated by solving Eqn.(5) staring from a random point within the -ball around x. Different colors encode the task losses used for generating adversarial examples (red: losscls, blue: lossloc). Therefore, the samples form empirical images of the correspondin… view at source ↗

**Figure 5.** Figure 5: Model performance under different number of steps for (a) losscls and (b) lossloc-based PGD attack with = 8. STD is the standard model. CLS and LOC are our robust models. turbed ones obtained by solving the inner problem, and then conducting conventional training of the model using the perturbed images as typically done in adversarial training [16, 33]. The inner maximization is approximately solved usi… view at source ↗

**Figure 7.** Figure 7: Visualization of attacks on STD model using losscls based 20-step PGD attack (zoom electronically for better view). All the attack methods incorporate sgn(·) operator into the PGD steps for normalization and efficiency following [16]. 5.2. Impacts of Task Losses on Robustness We will investigate the role of task losses in model robustness. For this purpose, we introduce the standard model and several vari… view at source ↗

**Figure 8.** Figure 8: Visual comparison between standard model and ours under DAG [57] and RAP [23] attacks with attack budget 8. architecture DAG [57] RAP [23] STD ours STD ours SSD +VGG16 0.3 28.5 6.6 44.9 RFB +ResNet50 0.4 27.4 8.7 48.7 FSSD +DarkNet53 0.3 29.4 7.6 46.8 YOLO +DarkNet53 0.1 27.6 8.1 44.3 [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗

**Figure 9.** Figure 9: Visualization of failure cases. Example challenging cases include images with small objects and visually confusing classes. transferred attack DAG [57] RAP [23] average SSD+ResNet50 49.3 49.4 49.4 SSD+DarkNet53 49.2 49.4 49.3 RFB+ResNet50 49.1 49.3 49.2 FSSD+DarkNet53 49.3 49.2 49.3 YOLO+DarkNet53 49.5 49.5 49.5 [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗

read the original abstract

Object detection is an important vision task and has emerged as an indispensable component in many vision system, rendering its robustness as an increasingly important performance factor for practical applications. While object detection models have been demonstrated to be vulnerable against adversarial attacks by many recent works, very few efforts have been devoted to improving their robustness. In this work, we take an initial attempt towards this direction. We first revisit and systematically analyze object detectors and many recently developed attacks from the perspective of model robustness. We then present a multi-task learning perspective of object detection and identify an asymmetric role of task losses. We further develop an adversarial training approach which can leverage the multiple sources of attacks for improving the robustness of detection models. Extensive experiments on PASCAL-VOC and MS-COCO verified the effectiveness of the proposed approach.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper frames detection as multi-task learning to guide multi-attack adversarial training, but the abstract leaves the asymmetry's actual contribution unclear.

read the letter

The core idea is to view object detection through its classification and localization losses, note that they behave asymmetrically under attack, and then use that observation to build an adversarial training procedure that draws on multiple attack sources at once. That multi-task angle and the resulting training scheme are the main new pieces relative to earlier attack papers on detectors. The systematic revisit of detectors and attacks from a robustness standpoint is also a solid step that gives the work some grounding before the method section. Experiments on the usual PASCAL VOC and MS-COCO sets at least put the claim on standard data rather than toy problems. The soft spot is exactly the one the stress-test flags: the abstract asserts that the asymmetric role shaped the training approach, yet supplies no loss equations, weighting scheme, or ablation that isolates whether the asymmetry changes outcomes versus simply training against several attacks in a standard way. Without those controls it is hard to judge whether the perspective is load-bearing or mostly motivational. The reported effectiveness is stated but not quantified here, so the practical size of any gain stays unknown. This is early work aimed at people who need detectors that hold up in safety-critical settings. It is coherent enough on its own terms to merit referee time, though the authors would need to add the missing mechanistic evidence and baseline comparisons for the paper to land cleanly.

Referee Report

2 major / 1 minor

Summary. The manuscript analyzes object detectors and adversarial attacks from a robustness perspective, introduces a multi-task learning view of object detection that identifies an asymmetric role of task losses (classification vs. localization), develops an adversarial training method that leverages multiple attack sources based on this asymmetry, and reports that experiments on PASCAL-VOC and MS-COCO verify the effectiveness of the approach.

Significance. If the experiments demonstrate robustness gains specifically attributable to exploiting the identified asymmetry (rather than generic multi-attack training), the work would address an important gap in making object detectors robust for practical applications. The use of standard benchmark datasets supports reproducibility.

major comments (2)

[§3] §3: The multi-task perspective is used to identify the asymmetric role of task losses, but no equation or explicit formulation is given showing how this asymmetry determines attack selection, weighting, or generation in the adversarial training objective; without this, it is unclear whether the perspective is load-bearing for the method.
[Experiments section] Experiments section: The claim that the approach improves robustness by leveraging the asymmetry is not supported by an ablation that compares the proposed method against standard multi-attack adversarial training (without the asymmetry); this omission prevents verification that the identified perspective drives the reported gains.

minor comments (1)

[Abstract] Abstract: The statement that 'extensive experiments verified the effectiveness' would benefit from a brief mention of key quantitative improvements or baselines to give readers an immediate sense of the results.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comments point by point below, indicating where revisions will be made to strengthen the presentation and empirical support.

read point-by-point responses

Referee: [§3] §3: The multi-task perspective is used to identify the asymmetric role of task losses, but no equation or explicit formulation is given showing how this asymmetry determines attack selection, weighting, or generation in the adversarial training objective; without this, it is unclear whether the perspective is load-bearing for the method.

Authors: We agree that an explicit formulation would better demonstrate how the identified asymmetry guides the method. Section 3 qualitatively describes the multi-task view of object detection and the asymmetric importance of classification versus localization losses, which motivates selecting and combining attacks from multiple sources. To address this, we will add a mathematical formulation in the revised Section 3 that explicitly connects the asymmetry to attack generation, selection, and weighting within the adversarial training objective. revision: yes
Referee: [Experiments section] Experiments section: The claim that the approach improves robustness by leveraging the asymmetry is not supported by an ablation that compares the proposed method against standard multi-attack adversarial training (without the asymmetry); this omission prevents verification that the identified perspective drives the reported gains.

Authors: This comment is valid. The current experiments on PASCAL-VOC and MS-COCO demonstrate the overall effectiveness of the approach, but do not include a direct ablation isolating the asymmetry from generic multi-attack training. We will add this ablation study in the revised manuscript to provide evidence that the reported robustness gains are attributable to the multi-task asymmetry perspective. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation is self-contained with independent experimental support

full rationale

The abstract describes a sequence of analysis of detectors and attacks, followed by a multi-task perspective that identifies an asymmetric role of task losses, leading to an adversarial training method whose effectiveness is verified on PASCAL-VOC and MS-COCO. No equations, fitted parameters renamed as predictions, self-definitional constructs, or load-bearing self-citations appear in the provided text. The central claim rests on the perspective informing the method and on external experimental validation rather than reducing to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, no free parameters, axioms, or invented entities are explicitly identifiable or required for the central claim.

pith-pipeline@v0.9.0 · 5657 in / 1064 out tokens · 29370 ms · 2026-05-24T17:01:32.707013+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages · 10 internal anchors

[1]

Athalye, N

A. Athalye, N. Carlini, and D. Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to ad- versarial examples. In International Conference on Machine learning, 2018

work page 2018
[2]

Biggio and F

B. Biggio and F. Roli. Wild patterns: Ten years after the rise of adversarial machine learning. In ACM Conference on Computer and Communications Security, 2018

work page 2018
[3]

Cai and N

Z. Cai and N. Vasconcelos. Cascade R-CNN: Delving into high quality object detection. In IEEE Conference on Com- puter Vision and Pattern Recognition, 2018

work page 2018
[4]

Carlini and D

N. Carlini and D. Wagner. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy, 2017

work page 2017
[5]

R. Caruana. Multitask learning. Machine Learning , 28(1):41–75, 1997

work page 1997
[6]

S. Chen, C. Cornelius, J. Martin, and D. H. Chau. ShapeShifter: Robust physical adversarial attack on Faster R-CNN object detector. CoRR, abs/1804.05810, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[7]

L. Cui. MDSSD: Multi-scale deconvolutional single shot de- tector for small objects. CoRR, abs/1805.07009, 2018

work page arXiv 2018
[8]

Dalal and B

N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In IEEE Conference on Computer Vision and Pattern Recognition, 2005

work page 2005
[9]

Erhan, C

D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov. Scalable object detection using deep neural networks. In IEEE Con- ference on Computer Vision and Pattern Recognition, 2014

work page 2014
[10]

Everingham, S

M. Everingham, S. M. Eslami, L. Gool, C. K. Williams, J. Winn, and A. Zisserman. The pascal visual object classes challenge: A retrospective. Int. J. Comput. Vision , 111(1):98–136, 2015

work page 2015
[11]

Physical Adversarial Examples for Object Detectors

K. Eykholt, I. Evtimov, E. Fernandes, B. Li, A. Rahmati, F. Tram `er, A. Prakash, T. Kohno, and D. Song. Phys- ical adversarial examples for object detectors. CoRR, abs/1807.07769, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[12]

P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ra- manan. Object detection with discriminatively trained part- based models. IEEE Trans. Pattern Anal. Mach. Intell. , 32(9):1627–1645, 2010

work page 2010
[13]

C.-Y . Fu, W. Liu, A. Ranga, A. Tyagi, and A. C. Berg. DSSD: Deconvolutional single shot detector. CoRR, abs/1701.06659, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[14]

Girshick

R. Girshick. Fast R-CNN. In IEEE International Conference on Computer Vision, 2015

work page 2015
[15]

Girshick, J

R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich fea- ture hierarchies for accurate object detection and semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition, 2014

work page 2014
[16]

Goodfellow, J

I. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. In International Confer- ence on Learning Representations, 2015

work page 2015
[17]

C. Guo, M. Rana, M. Ciss ´e, and L. van der Maaten. Coun- tering adversarial images using input transformations. In In- ternational Conference on Learning Representations, 2018

work page 2018
[18]

K. He, R. B. Girshick, and P. Doll ´ar. Rethinking ImageNet pre-training. CoRR, abs/1811.08883, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[19]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition, 2016

work page 2016
[20]

Kendall, Y

A. Kendall, Y . Gal, and R. Cipolla. Multi-task learning using uncertainty to weigh losses for scene geometry and seman- tics. In IEEE Conference on Computer Vision and Pattern Recognition, 2018

work page 2018
[21]

Kurakin, I

A. Kurakin, I. Goodfellow, and S. Bengio. Adversarial machine learning at scale. In International Conference on Learning Representations, 2017

work page 2017
[22]

Y . Li, X. Bian, and S. Lyu. Attacking object detectors via im- perceptible patches on background. CoRR, abs/1809.05966, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[23]

Y . Li, D. Tian, M. Chang, X. Bian, and S. Lyu. Robust adver- sarial perturbation on deep proposal-based models. InBritish Machine Vision Conference, 2018

work page 2018
[24]

Li and F

Z. Li and F. Zhou. FSSD: feature fusion single shot multibox detector. CoRR, abs/1712.00960, 2017

work page arXiv 2017
[25]

F. Liao, M. Liang, Y . Dong, and T. Pang. Defense against ad- versarial attacks using high-level representation guided de- noiser. In IEEE Conference on Computer Vision and Pattern Recognition, 2018

work page 2018
[26]

T.-Y . Lin, P. Goyal, R. B. Girshick, K. He, and P. Doll ´ar. Focal loss for dense object detection. In International Con- ference on Computer Vision, 2017

work page 2017
[27]

T.-Y . Lin, M. Maire, S. J. Belongie, L. D. Bourdev, R. B. Gir- shick, J. Hays, P. Perona, D. Ramanan, P. Doll ´ar, and C. L. Zitnick. Microsoft COCO: Common objects in context. In European Conference on Computer Vision, 2014

work page 2014
[28]

S. Liu, D. Huang, and a. Wang. Receptive ﬁeld block net for accurate and fast object detection. In European Conference on Computer Vision, 2018

work page 2018
[29]

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y . Fu, and A. C. Berg. SSD: Single shot multibox detector. In European Conference on Computer Vision, 2016

work page 2016
[30]

X. Liu, M. Cheng, H. Zhang, and C.-J. Hsieh. Towards ro- bust neural networks via random self-ensemble. InEuropean Conference on Computer Vision, 2018

work page 2018
[31]

X. Liu, H. Yang, L. Song, H. Li, and Y . Chen. DPatch: Attacking object detectors with adversarial patches. CoRR, abs/1806.02299, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[32]

J. Lu, H. Sibai, and E. Fabry. Adversarial examples that fool detectors. CoRR, abs/1712.02494, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[33]

Madry, A

A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models resistant to ad- versarial attacks. In International Conference on Learning Representations, 2018

work page 2018
[34]

Meng and H

D. Meng and H. Chen. MagNet: a two-pronged defense against adversarial examples. In ACM SIGSAC Conference on Computer and Communications Security, 2017

work page 2017
[35]

J. H. Metzen, T. Genewein, V . Fischer, and B. Bischoff. On detecting adversarial perturbations. In International Confer- ence on Learning Representations, 2017

work page 2017
[36]

Moosavi-Dezfooli, A

S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard. Deep- Fool: a simple and accurate method to fool deep neural net- works. In IEEE Conference on Computer Vision and Pattern Recognition, 2016

work page 2016
[37]

Nguyen, J

A. Nguyen, J. Yosinski, and J. Clune. Deep neural networks are easily fooled: High conﬁdence predictions for unrecog- nizable images. In IEEE Conference on Computer Vision and Pattern Recognition, 2015

work page 2015
[38]

Prakash, N

A. Prakash, N. Moran, S. Garber, A. DiLillo, and J. Storer. Deﬂecting adversarial attacks with pixel deﬂection. In IEEE Conference on Computer Vision and Pattern Recognition , 2018

work page 2018
[39]

J. Redmon. Darknet: Open source neural networks in C. http://pjreddie.com/darknet/, 2013–2016

work page 2013
[40]

Redmon, S

J. Redmon, S. K. Divvala, R. B. Girshick, and A. Farhadi. You only look once: Uniﬁed, real-time object detection. In IEEE Conference on Computer Vision and Pattern Recogni- tion, 2016

work page 2016
[41]

YOLOv3: An Incremental Improvement

J. Redmon and A. Farhadi. YOLOv3: An incremental im- provement. CoRR, abs/1804.02767, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[42]

S. Ren, K. He, R. B. Girshick, and J. Sun. Faster R-CNN: Towards real-time object detection with region proposal net- works. In Advances in Neural Information Processing Sys- tems, 2015

work page 2015
[43]

Rosenfeld and M

A. Rosenfeld and M. Thurston. Edge and curve detection for visual scene analysis. IEEE Trans. Comput., 20(5):562–569, 1971

work page 1971
[44]

Samangouei, M

P. Samangouei, M. Kabkab, and R. Chellappa. Defense- GAN: Protecting classiﬁers against adversarial attacks using generative models. In International Conference on Learning Representations, 2018

work page 2018
[45]

Simonyan and A

K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, 2015

work page 2015
[46]

Y . Song, T. Kim, S. Nowozin, S. Ermon, and N. Kushman. Pixeldefend: Leveraging generative models to understand and defend against adversarial examples. In International Conference on Learning Representations, 2018

work page 2018
[47]

Szegedy, W

C. Szegedy, W. Liu, Y . Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V . Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In IEEE Conference on Computer Vision and Pattern Recognition, 2015

work page 2015
[48]

Scalable, High-Quality Object Detection

C. Szegedy, S. E. Reed, D. Erhan, and D. Anguelov. Scal- able, high-quality object detection. CoRR, abs/1412.1441, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[49]

Szegedy, W

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. In International Conference on Learning Repre- sentations, 2014

work page 2014
[50]

Tram `er, A

F. Tram `er, A. Kurakin, N. Papernot, D. Boneh, and P. Mc- Daniel. Ensemble adversarial training: Attacks and defenses. In International Conference on Learning Representations , 2018

work page 2018
[51]

Tsipras, S

D. Tsipras, S. Santurkar, L. Engstrom, A. Turner, and A. Madry. Robustness may be at odds with accuracy. In In- ternational Conference on Learning Representations, 2019

work page 2019
[52]

Tsoumakas and I

G. Tsoumakas and I. Katakis. Multi label classiﬁcation: An overview. 3(3):1–13, 2007

work page 2007
[53]

Uijlings, K

J. Uijlings, K. van de Sande, T. Gevers, and A. Smeulders. Selective search for object recognition. International Jour- nal of Computer Vision, 104(2):154–171, 2013

work page 2013
[54]

Viola and M

P. Viola and M. J. Jones. Robust real-time face detection.Int. J. Comput. Vision, 57(2):137–154, 2004

work page 2004
[55]

X. Wei, S. Liang, X. Cao, and J. Zhu. Transferable adver- sarial attacks for image and video object detection. CoRR, abs/1811.12641, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[56]

C. Xie, J. Wang, Z. Zhang, Z. Ren, and A. Yuille. Mitigating adversarial effects through randomization. In International Conference on Learning Representations, 2018

work page 2018
[57]

C. Xie, J. Wang, Z. Zhang, Y . Zhou, L. Xie, and A. Yuille. Adversarial examples for semantic segmentation and object detection. In International Conference on Computer Vision, 2017

work page 2017
[58]

C. Xie, Y . Wu, L. van der Maaten, A. Yuille, and K. He. Feature denoising for improving adversarial robustness. In IEEE Conference on Computer Vision and Pattern Recogni- tion, 2019

work page 2019
[59]

X. Zhao, H. Li, X. Shen, X. Liang, and Y . Wu. A modulation module for multi-task learning with applications in image re- trieval. In European Conference on Computer Vision, 2018

work page 2018

[1] [1]

Athalye, N

A. Athalye, N. Carlini, and D. Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to ad- versarial examples. In International Conference on Machine learning, 2018

work page 2018

[2] [2]

Biggio and F

B. Biggio and F. Roli. Wild patterns: Ten years after the rise of adversarial machine learning. In ACM Conference on Computer and Communications Security, 2018

work page 2018

[3] [3]

Cai and N

Z. Cai and N. Vasconcelos. Cascade R-CNN: Delving into high quality object detection. In IEEE Conference on Com- puter Vision and Pattern Recognition, 2018

work page 2018

[4] [4]

Carlini and D

N. Carlini and D. Wagner. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy, 2017

work page 2017

[5] [5]

R. Caruana. Multitask learning. Machine Learning , 28(1):41–75, 1997

work page 1997

[6] [6]

S. Chen, C. Cornelius, J. Martin, and D. H. Chau. ShapeShifter: Robust physical adversarial attack on Faster R-CNN object detector. CoRR, abs/1804.05810, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[7] [7]

L. Cui. MDSSD: Multi-scale deconvolutional single shot de- tector for small objects. CoRR, abs/1805.07009, 2018

work page arXiv 2018

[8] [8]

Dalal and B

N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In IEEE Conference on Computer Vision and Pattern Recognition, 2005

work page 2005

[9] [9]

Erhan, C

D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov. Scalable object detection using deep neural networks. In IEEE Con- ference on Computer Vision and Pattern Recognition, 2014

work page 2014

[10] [10]

Everingham, S

M. Everingham, S. M. Eslami, L. Gool, C. K. Williams, J. Winn, and A. Zisserman. The pascal visual object classes challenge: A retrospective. Int. J. Comput. Vision , 111(1):98–136, 2015

work page 2015

[11] [11]

Physical Adversarial Examples for Object Detectors

K. Eykholt, I. Evtimov, E. Fernandes, B. Li, A. Rahmati, F. Tram `er, A. Prakash, T. Kohno, and D. Song. Phys- ical adversarial examples for object detectors. CoRR, abs/1807.07769, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[12] [12]

P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ra- manan. Object detection with discriminatively trained part- based models. IEEE Trans. Pattern Anal. Mach. Intell. , 32(9):1627–1645, 2010

work page 2010

[13] [13]

C.-Y . Fu, W. Liu, A. Ranga, A. Tyagi, and A. C. Berg. DSSD: Deconvolutional single shot detector. CoRR, abs/1701.06659, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[14] [14]

Girshick

R. Girshick. Fast R-CNN. In IEEE International Conference on Computer Vision, 2015

work page 2015

[15] [15]

Girshick, J

R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich fea- ture hierarchies for accurate object detection and semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition, 2014

work page 2014

[16] [16]

Goodfellow, J

I. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. In International Confer- ence on Learning Representations, 2015

work page 2015

[17] [17]

C. Guo, M. Rana, M. Ciss ´e, and L. van der Maaten. Coun- tering adversarial images using input transformations. In In- ternational Conference on Learning Representations, 2018

work page 2018

[18] [18]

K. He, R. B. Girshick, and P. Doll ´ar. Rethinking ImageNet pre-training. CoRR, abs/1811.08883, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[19] [19]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition, 2016

work page 2016

[20] [20]

Kendall, Y

A. Kendall, Y . Gal, and R. Cipolla. Multi-task learning using uncertainty to weigh losses for scene geometry and seman- tics. In IEEE Conference on Computer Vision and Pattern Recognition, 2018

work page 2018

[21] [21]

Kurakin, I

A. Kurakin, I. Goodfellow, and S. Bengio. Adversarial machine learning at scale. In International Conference on Learning Representations, 2017

work page 2017

[22] [22]

Y . Li, X. Bian, and S. Lyu. Attacking object detectors via im- perceptible patches on background. CoRR, abs/1809.05966, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[23] [23]

Y . Li, D. Tian, M. Chang, X. Bian, and S. Lyu. Robust adver- sarial perturbation on deep proposal-based models. InBritish Machine Vision Conference, 2018

work page 2018

[24] [24]

Li and F

Z. Li and F. Zhou. FSSD: feature fusion single shot multibox detector. CoRR, abs/1712.00960, 2017

work page arXiv 2017

[25] [25]

F. Liao, M. Liang, Y . Dong, and T. Pang. Defense against ad- versarial attacks using high-level representation guided de- noiser. In IEEE Conference on Computer Vision and Pattern Recognition, 2018

work page 2018

[26] [26]

T.-Y . Lin, P. Goyal, R. B. Girshick, K. He, and P. Doll ´ar. Focal loss for dense object detection. In International Con- ference on Computer Vision, 2017

work page 2017

[27] [27]

T.-Y . Lin, M. Maire, S. J. Belongie, L. D. Bourdev, R. B. Gir- shick, J. Hays, P. Perona, D. Ramanan, P. Doll ´ar, and C. L. Zitnick. Microsoft COCO: Common objects in context. In European Conference on Computer Vision, 2014

work page 2014

[28] [28]

S. Liu, D. Huang, and a. Wang. Receptive ﬁeld block net for accurate and fast object detection. In European Conference on Computer Vision, 2018

work page 2018

[29] [29]

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y . Fu, and A. C. Berg. SSD: Single shot multibox detector. In European Conference on Computer Vision, 2016

work page 2016

[30] [30]

X. Liu, M. Cheng, H. Zhang, and C.-J. Hsieh. Towards ro- bust neural networks via random self-ensemble. InEuropean Conference on Computer Vision, 2018

work page 2018

[31] [31]

X. Liu, H. Yang, L. Song, H. Li, and Y . Chen. DPatch: Attacking object detectors with adversarial patches. CoRR, abs/1806.02299, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[32] [32]

J. Lu, H. Sibai, and E. Fabry. Adversarial examples that fool detectors. CoRR, abs/1712.02494, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[33] [33]

Madry, A

A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models resistant to ad- versarial attacks. In International Conference on Learning Representations, 2018

work page 2018

[34] [34]

Meng and H

D. Meng and H. Chen. MagNet: a two-pronged defense against adversarial examples. In ACM SIGSAC Conference on Computer and Communications Security, 2017

work page 2017

[35] [35]

J. H. Metzen, T. Genewein, V . Fischer, and B. Bischoff. On detecting adversarial perturbations. In International Confer- ence on Learning Representations, 2017

work page 2017

[36] [36]

Moosavi-Dezfooli, A

S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard. Deep- Fool: a simple and accurate method to fool deep neural net- works. In IEEE Conference on Computer Vision and Pattern Recognition, 2016

work page 2016

[37] [37]

Nguyen, J

A. Nguyen, J. Yosinski, and J. Clune. Deep neural networks are easily fooled: High conﬁdence predictions for unrecog- nizable images. In IEEE Conference on Computer Vision and Pattern Recognition, 2015

work page 2015

[38] [38]

Prakash, N

A. Prakash, N. Moran, S. Garber, A. DiLillo, and J. Storer. Deﬂecting adversarial attacks with pixel deﬂection. In IEEE Conference on Computer Vision and Pattern Recognition , 2018

work page 2018

[39] [39]

J. Redmon. Darknet: Open source neural networks in C. http://pjreddie.com/darknet/, 2013–2016

work page 2013

[40] [40]

Redmon, S

J. Redmon, S. K. Divvala, R. B. Girshick, and A. Farhadi. You only look once: Uniﬁed, real-time object detection. In IEEE Conference on Computer Vision and Pattern Recogni- tion, 2016

work page 2016

[41] [41]

YOLOv3: An Incremental Improvement

J. Redmon and A. Farhadi. YOLOv3: An incremental im- provement. CoRR, abs/1804.02767, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[42] [42]

S. Ren, K. He, R. B. Girshick, and J. Sun. Faster R-CNN: Towards real-time object detection with region proposal net- works. In Advances in Neural Information Processing Sys- tems, 2015

work page 2015

[43] [43]

Rosenfeld and M

A. Rosenfeld and M. Thurston. Edge and curve detection for visual scene analysis. IEEE Trans. Comput., 20(5):562–569, 1971

work page 1971

[44] [44]

Samangouei, M

P. Samangouei, M. Kabkab, and R. Chellappa. Defense- GAN: Protecting classiﬁers against adversarial attacks using generative models. In International Conference on Learning Representations, 2018

work page 2018

[45] [45]

Simonyan and A

K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, 2015

work page 2015

[46] [46]

Y . Song, T. Kim, S. Nowozin, S. Ermon, and N. Kushman. Pixeldefend: Leveraging generative models to understand and defend against adversarial examples. In International Conference on Learning Representations, 2018

work page 2018

[47] [47]

Szegedy, W

C. Szegedy, W. Liu, Y . Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V . Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In IEEE Conference on Computer Vision and Pattern Recognition, 2015

work page 2015

[48] [48]

Scalable, High-Quality Object Detection

C. Szegedy, S. E. Reed, D. Erhan, and D. Anguelov. Scal- able, high-quality object detection. CoRR, abs/1412.1441, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[49] [49]

Szegedy, W

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. In International Conference on Learning Repre- sentations, 2014

work page 2014

[50] [50]

Tram `er, A

F. Tram `er, A. Kurakin, N. Papernot, D. Boneh, and P. Mc- Daniel. Ensemble adversarial training: Attacks and defenses. In International Conference on Learning Representations , 2018

work page 2018

[51] [51]

Tsipras, S

D. Tsipras, S. Santurkar, L. Engstrom, A. Turner, and A. Madry. Robustness may be at odds with accuracy. In In- ternational Conference on Learning Representations, 2019

work page 2019

[52] [52]

Tsoumakas and I

G. Tsoumakas and I. Katakis. Multi label classiﬁcation: An overview. 3(3):1–13, 2007

work page 2007

[53] [53]

Uijlings, K

J. Uijlings, K. van de Sande, T. Gevers, and A. Smeulders. Selective search for object recognition. International Jour- nal of Computer Vision, 104(2):154–171, 2013

work page 2013

[54] [54]

Viola and M

P. Viola and M. J. Jones. Robust real-time face detection.Int. J. Comput. Vision, 57(2):137–154, 2004

work page 2004

[55] [55]

X. Wei, S. Liang, X. Cao, and J. Zhu. Transferable adver- sarial attacks for image and video object detection. CoRR, abs/1811.12641, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[56] [56]

C. Xie, J. Wang, Z. Zhang, Z. Ren, and A. Yuille. Mitigating adversarial effects through randomization. In International Conference on Learning Representations, 2018

work page 2018

[57] [57]

C. Xie, J. Wang, Z. Zhang, Y . Zhou, L. Xie, and A. Yuille. Adversarial examples for semantic segmentation and object detection. In International Conference on Computer Vision, 2017

work page 2017

[58] [58]

C. Xie, Y . Wu, L. van der Maaten, A. Yuille, and K. He. Feature denoising for improving adversarial robustness. In IEEE Conference on Computer Vision and Pattern Recogni- tion, 2019

work page 2019

[59] [59]

X. Zhao, H. Li, X. Shen, X. Liang, and Y . Wu. A modulation module for multi-task learning with applications in image re- trieval. In European Conference on Computer Vision, 2018

work page 2018