pith. sign in

arxiv: 2002.09053 · v3 · pith:TY3DTH5Znew · submitted 2020-02-20 · 💻 cs.CV

Adapted Center and Scale Prediction: More Stable and More Accurate

Pith reviewed 2026-05-24 14:12 UTC · model grok-4.3

classification 💻 cs.CV
keywords pedestrian detectionanchor-free detectorone-stage detectorCityPersons benchmarkcenter and scale predictioncompressing widthswitchable normalization
0
0 comments X

The pith

Adapting Center and Scale Prediction with robustness fixes and compressing width yields second-best results on CityPersons pedestrian benchmark.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper adapts the Center and Scale Prediction detector to improve its robustness during training and introduce a compressing width method for predicting object widths. These changes produce an anchor-free one-stage model that reaches 9.3 percent log-average miss rate on the reasonable set of CityPersons. The result also shows 8.7 percent on the partial set and 5.6 percent on the bare set. This performance indicates that such simple detectors can still reach high accuracy levels previously associated with more complex two-stage approaches. The work additionally examines further properties of Switchable Normalization.

Core claim

By improving the robustness of CSP and introducing compressing width prediction, the adapted detector attains 9.3% MR on reasonable, 8.7% on partial, and 5.6% on bare sets of CityPersons, showing anchor-free and one-stage detectors can still have high accuracy.

What carries the argument

Adapted Center and Scale Prediction detector that adds robustness improvements and a compressing width prediction method.

If this is right

  • An anchor-free one-stage detector can achieve second-best performance on the CityPersons benchmark.
  • The model reaches 9.3% log-average miss rate on the reasonable set, 8.7% on partial, and 5.6% on bare.
  • Switchable Normalization has capabilities not mentioned in its original paper.
  • Pedestrian detection can use simpler detector designs while maintaining competitive accuracy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The robustness and width adaptations might transfer to other one-stage detection tasks to reduce training instability.
  • Further study of normalization methods could identify additional ways to boost accuracy in similar models.
  • Lowering dependence on anchor boxes may simplify deployment in resource-limited settings.

Load-bearing premise

The reported performance gains on CityPersons are caused by the proposed adaptations rather than differences in training procedure, data handling, or other implementation details.

What would settle it

Retraining the original CSP detector with the same training procedure and data handling as the adapted version and checking whether it matches or exceeds the reported miss rates on CityPersons.

Figures

Figures reproduced from arXiv: 2002.09053 by Jusheng Zhang, Wenhao Wang.

Figure 1
Figure 1. Figure 1: We use CityPersons test set to illustrate our ACSP detection ability. It is worthy to mention that, [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: It is the architecture of original CSP [ [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The proportion of the weight of each normalization method in different parts is shown in the [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Comparisons of different batch size. It is [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
read the original abstract

Pedestrian detection benefits from deep learning technology and gains rapid development in recent years. Most of detectors follow general object detection frame, i.e. default boxes and two-stage process. Recently, anchor-free and one-stage detectors have been introduced into this area. However, their accuracies are unsatisfactory. Therefore, in order to enjoy the simplicity of anchor-free detectors and the accuracy of two-stage ones simultaneously, we propose some adaptations based on a detector, Center and Scale Prediction(CSP). The main contributions of our paper are: (1) We improve the robustness of CSP and make it easier to train. (2) We propose a novel method to predict width, namely compressing width. (3) We achieve the second best performance on CityPersons benchmark, i.e. 9.3% log-average miss rate(MR) on reasonable set, 8.7% MR on partial set and 5.6% MR on bare set, which shows an anchor-free and one-stage detector can still have high accuracy. (4) We explore some capabilities of Switchable Normalization which are not mentioned in its original paper. The code is publicly available at https://github.com/WangWenhao0716/Adapted-Center-and-Scale-Prediction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes two adaptations to the Center and Scale Prediction (CSP) anchor-free one-stage pedestrian detector: (1) robustness improvements intended to make training more stable and easier, and (2) a 'compressing width' technique for width prediction. It reports second-best results on the CityPersons benchmark (9.3% log-average miss rate on the reasonable set, 8.7% on partial, 5.6% on bare) and explores additional properties of Switchable Normalization. The code is released publicly.

Significance. If the reported gains are shown to result from the two listed adaptations rather than training-protocol differences, the work would demonstrate that anchor-free one-stage detectors can reach competitive accuracy on pedestrian detection benchmarks. The public code release is a clear strength for reproducibility.

major comments (2)
  1. [Experiments] Experiments section (Tables 1-3 and associated text): no ablation is presented that re-trains the original CSP under the exact same optimizer, augmentation, epoch count, and data protocol as the adapted version. Without this controlled comparison, the 9.3% MR on the reasonable set cannot be attributed to the robustness fixes or compressing-width change rather than unstated implementation differences.
  2. [§3.2] §3.2 (compressing width): the method is described only at a high level with no equation, loss term, or pseudocode that distinguishes it from standard width regression; therefore its claimed novelty and its contribution to the final numbers cannot be evaluated.
minor comments (2)
  1. [Abstract and §4] The abstract states that Switchable Normalization capabilities 'not mentioned in its original paper' are explored, yet the main text does not list the specific new observations or provide a dedicated subsection or table for them.
  2. [Figures] Figure captions and axis labels in the CityPersons result plots use inconsistent font sizes and omit error bars or run-to-run variance, reducing clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The two major comments identify important gaps in experimental controls and methodological clarity. We address each below and will revise the manuscript to strengthen the claims.

read point-by-point responses
  1. Referee: [Experiments] Experiments section (Tables 1-3 and associated text): no ablation is presented that re-trains the original CSP under the exact same optimizer, augmentation, epoch count, and data protocol as the adapted version. Without this controlled comparison, the 9.3% MR on the reasonable set cannot be attributed to the robustness fixes or compressing-width change rather than unstated implementation differences.

    Authors: We agree that the current experiments lack a controlled re-implementation of the original CSP under identical training settings. This prevents definitive attribution of the 9.3% MR improvement. In the revised manuscript we will add an ablation that re-trains the baseline CSP with the exact optimizer, augmentation, epoch schedule, and data protocol used for the adapted model, allowing direct isolation of the contributions from the robustness changes and compressing-width technique. revision: yes

  2. Referee: [§3.2] §3.2 (compressing width): the method is described only at a high level with no equation, loss term, or pseudocode that distinguishes it from standard width regression; therefore its claimed novelty and its contribution to the final numbers cannot be evaluated.

    Authors: We acknowledge that §3.2 currently provides only a high-level description. In the revision we will insert the precise mathematical formulation of the compressing-width prediction, the modified loss term, and pseudocode that explicitly contrasts it with standard width regression. These additions will clarify the novelty and enable readers to assess its specific contribution to the reported results. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical benchmark results are independent of any self-defined derivations

full rationale

The paper describes adaptations (robustness improvements and compressing width) to the CSP detector and reports standard CityPersons benchmark numbers (9.3% MR reasonable etc.). No equations, fitted parameters, or predictions are presented that reduce by construction to inputs. No self-citation load-bearing uniqueness theorems, ansatzes smuggled via citation, or renaming of known results appear in the provided text. The central claim is an empirical performance comparison whose validity rests on implementation details and ablations (not supplied), not on any derivation chain that collapses to self-reference. This matches the default case of a non-circular empirical ML paper.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Abstract-only review; insufficient detail to enumerate specific free parameters, axioms, or invented entities beyond standard deep-learning training assumptions.

free parameters (1)
  • training hyperparameters
    Deep learning detectors rely on many fitted hyperparameters such as learning rate and loss weights not detailed in the abstract.
axioms (1)
  • domain assumption CityPersons benchmark provides a fair and representative measure of pedestrian detection performance.
    Performance claims rest on the validity of this standard evaluation protocol.

pith-pipeline@v0.9.0 · 5747 in / 1229 out tokens · 36277 ms · 2026-05-24T14:12:06.579543+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · 11 internal anchors

  1. [1]

    Layer Normalization

    Jimmy Lei Ba, Jamie Ryan Kiros, and Geof- frey E Hinton. Layer normalization. arXiv preprint arXiv:1607.06450, 2016

  2. [2]

    Eurocity persons: A novel benchmark for person detection in traffic scenes

    Markus Braun, Sebastian Krebs, Fabian Flohr, and Dariu M Gavrila. Eurocity persons: A novel benchmark for person detection in traffic scenes. IEEE transactions on pattern analysis and ma- chine intelligence , 41(8):1844–1861, 2019. 10

  3. [3]

    The cityscapes dataset for seman- tic urban scene understanding

    Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Ro- drigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The cityscapes dataset for seman- tic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3213–3223, 2016

  4. [4]

    Imagenet: A large-scale hierarchical image database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE con- ference on computer vision and pattern recogni- tion, pages 248–255. Ieee, 2009

  5. [5]

    Fast feature pyramids for object detection

    Piotr Doll´ ar, Ron Appel, Serge Belongie, and Pietro Perona. Fast feature pyramids for object detection. IEEE transactions on pattern anal- ysis and machine intelligence , 36(8):1532–1545, 2014

  6. [6]

    Pedestrian detection: A benchmark

    Piotr Doll´ ar, Christian Wojek, Bernt Schiele, and Pietro Perona. Pedestrian detection: A benchmark. In 2009 IEEE Conference on Com- puter Vision and Pattern Recognition , pages 304–311. IEEE, 2009

  7. [7]

    Are we ready for autonomous driving? the kitti vision benchmark suite

    Andreas Geiger, Philip Lenz, and Raquel Urta- sun. Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE Con- ference on Computer Vision and Pattern Recog- nition, pages 3354–3361. IEEE, 2012

  8. [8]

    Fast r-cnn

    Ross Girshick. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 1440–1448, 2015

  9. [9]

    Rich feature hierarchies for ac- curate object detection and semantic segmenta- tion

    Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for ac- curate object detection and semantic segmenta- tion. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 580–587, 2014

  10. [10]

    Deep residual learning for image recog- nition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recog- nition. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 770–778, 2016

  11. [11]

    MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

    Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural net- works for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017

  12. [12]

    DenseBox: Unifying Landmark Localization with End to End Object Detection

    Lichao Huang, Yi Yang, Yafeng Deng, and Yi- nan Yu. Densebox: Unifying landmark local- ization with end to end object detection. arXiv preprint arXiv:1509.04874, 2015

  13. [13]

    Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

    Sergey Ioffe and Christian Szegedy. Batch nor- malization: Accelerating deep network train- ing by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015

  14. [14]

    Adam: A Method for Stochastic Optimization

    Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014

  15. [15]

    Foveabox: Beyond anchor-based object detector

    Tao Kong, Fuchun Sun, Huaping Liu, Yun- ing Jiang, and Jianbo Shi. Foveabox: Beyond anchor-based object detector. arXiv preprint arXiv:1904.03797, 2019

  16. [16]

    Cornernet: Detecting objects as paired keypoints

    Hei Law and Jia Deng. Cornernet: Detecting objects as paired keypoints. In Proceedings of the European Conference on Computer Vision (ECCV), pages 734–750, 2018

  17. [17]

    Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes

    Yuhong Li, Xiaofan Zhang, and Deming Chen. Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In Proceedings of the IEEE conference on com- puter vision and pattern recognition, pages 1091– 1100, 2018

  18. [18]

    Detnet: Design backbone for object detection

    Zeming Li, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng, and Jian Sun. Detnet: Design backbone for object detection. In Pro- ceedings of the European Conference on Com- puter Vision (ECCV) , pages 334–350, 2018

  19. [19]

    An ex- tended set of haar-like features for rapid object detection

    Rainer Lienhart and Jochen Maydt. An ex- tended set of haar-like features for rapid object detection. In Proceedings. international confer- ence on image processing , volume 1, pages I–I. IEEE, 2002. 11

  20. [20]

    Microsoft coco: Common objects in context

    Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Pi- otr Doll´ ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision , pages 740–755. Springer, 2014

  21. [21]

    Adaptive nms: Refining pedestrian detection in a crowd

    Songtao Liu, Di Huang, and Yunhong Wang. Adaptive nms: Refining pedestrian detection in a crowd. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 6459–6468, 2019

  22. [22]

    Ssd: Single shot multi- box detector

    Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multi- box detector. In European conference on com- puter vision , pages 21–37. Springer, 2016

  23. [23]

    Learning efficient single- stage pedestrian detectors by asymptotic local- ization fitting

    Wei Liu, Shengcai Liao, Weidong Hu, Xuezhi Liang, and Xiao Chen. Learning efficient single- stage pedestrian detectors by asymptotic local- ization fitting. In Proceedings of the European Conference on Computer Vision (ECCV) , pages 618–634, 2018

  24. [24]

    High-level semantic feature detection: A new perspective for pedestrian de- tection

    Wei Liu, Shengcai Liao, Weiqiang Ren, Weidong Hu, and Yinan Yu. High-level semantic feature detection: A new perspective for pedestrian de- tection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 5187–5196, 2019

  25. [25]

    Semantic head en- hanced pedestrian detection in a crowd

    Ruiqi Lu and Huimin Ma. Semantic head en- hanced pedestrian detection in a crowd. arXiv preprint arXiv:1911.11985, 2019

  26. [26]

    Differentiable Learning-to-Normalize via Switchable Normalization

    Ping Luo, Jiamin Ren, Zhanglin Peng, Ruimao Zhang, and Jingyu Li. Differentiable learning-to- normalize via switchable normalization. arXiv preprint arXiv:1806.10779, 2018

  27. [27]

    Local Decorrelation For Improved Detection

    Woonhyun Nam, Piotr Doll´ ar, and Joon Hee Han. Local decorrelation for improved detection. arXiv preprint arXiv:1406.1134 , 2014

  28. [28]

    Mask-guided attention network for occluded pedestrian detection

    Yanwei Pang, Jin Xie, Muhammad Haris Khan, Rao Muhammad Anwer, Fahad Shahbaz Khan, and Ling Shao. Mask-guided attention network for occluded pedestrian detection. In Proceed- ings of the IEEE International Conference on Computer Vision , pages 4967–4975, 2019

  29. [29]

    Automatic differen- tiation in pytorch

    Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary De- Vito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differen- tiation in pytorch. 2017

  30. [30]

    You only look once: Uni- fied, real-time object detection

    Joseph Redmon, Santosh Divvala, Ross Gir- shick, and Ali Farhadi. You only look once: Uni- fied, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016

  31. [31]

    Yolo9000: bet- ter, faster, stronger

    Joseph Redmon and Ali Farhadi. Yolo9000: bet- ter, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recog- nition, pages 7263–7271, 2017

  32. [32]

    Faster r-cnn: Towards real-time ob- ject detection with region proposal networks

    Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time ob- ject detection with region proposal networks. In Advances in neural information processing sys- tems, pages 91–99, 2015

  33. [33]

    Weight nor- malization: A simple reparameterization to ac- celerate training of deep neural networks

    Tim Salimans and Durk P Kingma. Weight nor- malization: A simple reparameterization to ac- celerate training of deep neural networks. In Ad- vances in neural information processing systems , pages 901–909, 2016

  34. [34]

    How does batch normalization help optimization? In Ad- vances in Neural Information Processing Sys- tems, pages 2483–2493, 2018

    Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, and Aleksander Madry. How does batch normalization help optimization? In Ad- vances in Neural Information Processing Sys- tems, pages 2483–2493, 2018

  35. [35]

    CrowdHuman: A Benchmark for Detecting Human in a Crowd

    Shuai Shao, Zijian Zhao, Boxun Li, Tete Xiao, Gang Yu, Xiangyu Zhang, and Jian Sun. Crowd- human: A benchmark for detecting human in a crowd. arXiv preprint arXiv:1805.00123 , 2018

  36. [36]

    Very Deep Convolutional Networks for Large-Scale Image Recognition

    Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large- scale image recognition. arXiv preprint arXiv:1409.1556, 2014. 12

  37. [37]

    Small-scale pedestrian detec- tion based on topological line localization and temporal feature aggregation

    Tao Song, Leiyu Sun, Di Xie, Haiming Sun, and Shiliang Pu. Small-scale pedestrian detec- tion based on topological line localization and temporal feature aggregation. In Proceedings of the European Conference on Computer Vision (ECCV), pages 536–551, 2018

  38. [38]

    Compositional human pose regres- sion

    Xiao Sun, Jiaxiang Shang, Shuang Liang, and Yichen Wei. Compositional human pose regres- sion. In Proceedings of the IEEE International Conference on Computer Vision , pages 2602– 2611, 2017

  39. [39]

    Integral human pose regression

    Xiao Sun, Bin Xiao, Fangyin Wei, Shuang Liang, and Yichen Wei. Integral human pose regression. In Proceedings of the European Conference on Computer Vision (ECCV) , pages 529–545, 2018

  40. [40]

    Mean teach- ers are better role models: Weight-averaged con- sistency targets improve semi-supervised deep learning results

    Antti Tarvainen and Harri Valpola. Mean teach- ers are better role models: Weight-averaged con- sistency targets improve semi-supervised deep learning results. In Advances in neural informa- tion processing systems , pages 1195–1204, 2017

  41. [41]

    Instance Normalization: The Missing Ingredient for Fast Stylization

    Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. Instance normalization: The miss- ing ingredient for fast stylization. arXiv preprint arXiv:1607.08022, 2016

  42. [42]

    Robust real- time object detection

    Paul Viola, Michael Jones, et al. Robust real- time object detection. International journal of computer vision, 4(34-47):4, 2001

  43. [43]

    Repulsion loss: Detecting pedestrians in a crowd

    Xinlong Wang, Tete Xiao, Yuning Jiang, Shuai Shao, Jian Sun, and Chunhua Shen. Repulsion loss: Detecting pedestrians in a crowd. In Pro- ceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 7774– 7783, 2018

  44. [44]

    Group normaliza- tion

    Yuxin Wu and Kaiming He. Group normaliza- tion. In Proceedings of the European Conference on Computer Vision (ECCV) , pages 3–19, 2018

  45. [45]

    Psc-net: Learning part spatial co-occurence for occluded pedestrian detection

    Jin Xie, Yanwei Pang, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan, and Ling Shao. Psc-net: Learning part spatial co-occurence for occluded pedestrian detection. arXiv preprint arXiv:2001.09252 , 2020

  46. [46]

    Deep layer aggregation

    Fisher Yu, Dequan Wang, Evan Shelhamer, and Trevor Darrell. Deep layer aggregation. In Pro- ceedings of the IEEE conference on computer vi- sion and pattern recognition , pages 2403–2412, 2018

  47. [47]

    Jialiang Zhang, Lixiang Lin, Yun-chen Chen, Yao Hu, Steven C. H. Hoi, and Jianke Zhu. CSID: center, scale, identity and density-aware pedestrian detection in a crowd. CoRR, abs/1910.09188, 2019

  48. [48]

    Towards reaching human performance in pedestrian de- tection

    Shanshan Zhang, Rodrigo Benenson, Mohamed Omran, Jan Hosang, and Bernt Schiele. Towards reaching human performance in pedestrian de- tection. IEEE transactions on pattern analysis and machine intelligence , 40(4):973–986, 2017

  49. [49]

    Citypersons: A diverse dataset for pedestrian detection

    Shanshan Zhang, Rodrigo Benenson, and Bernt Schiele. Citypersons: A diverse dataset for pedestrian detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3213–3221, 2017

  50. [50]

    Filtered channel features for pedes- trian detection

    Shanshan Zhang, Rodrigo Benenson, Bernt Schiele, et al. Filtered channel features for pedes- trian detection. In CVPR, volume 1, page 4, 2015

  51. [51]

    Occluded pedestrian detection through guided attention in cnns

    Shanshan Zhang, Jian Yang, and Bernt Schiele. Occluded pedestrian detection through guided attention in cnns. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6995–7003, 2018

  52. [52]

    Occlusion-aware r-cnn: de- tecting pedestrians in a crowd

    Shifeng Zhang, Longyin Wen, Xiao Bian, Zhen Lei, and Stan Z Li. Occlusion-aware r-cnn: de- tecting pedestrians in a crowd. In Proceedings of the European Conference on Computer Vision (ECCV), pages 637–653, 2018

  53. [53]

    Discriminative feature transformation for oc- cluded pedestrian detection

    Chunluan Zhou, Ming Yang, and Junsong Yuan. Discriminative feature transformation for oc- cluded pedestrian detection. In Proceedings of the IEEE International Conference on Com- puter Vision , pages 9557–9566, 2019. 13

  54. [54]

    Bi-box regression for pedestrian detection and occlu- sion estimation

    Chunluan Zhou and Junsong Yuan. Bi-box regression for pedestrian detection and occlu- sion estimation. In Proceedings of the European Conference on Computer Vision (ECCV) , pages 135–151, 2018

  55. [55]

    Objects as Points

    Xingyi Zhou, Dequan Wang, and Philipp Kr¨ ahenb¨ uhl. Objects as points.arXiv preprint arXiv:1904.07850, 2019

  56. [56]

    Bottom-up object detection by group- ing extreme and center points

    Xingyi Zhou, Jiacheng Zhuo, and Philipp Kra- henbuhl. Bottom-up object detection by group- ing extreme and center points. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 850–859, 2019

  57. [57]

    Feature selective anchor-free module for single- shot object detection

    Chenchen Zhu, Yihui He, and Marios Savvides. Feature selective anchor-free module for single- shot object detection. In Proceedings of the IEEE Conference on Computer Vision and Pat- tern Recognition, pages 840–849, 2019. 14