Bag of Freebies for Training Object Detection Neural Networks

Hang Zhang; Junyuan Xie; Mu Li; Tong He; Zhi Zhang; Zhongyue Zhang

arxiv: 1902.04103 · v3 · pith:5JGNZFZKnew · submitted 2019-02-11 · 💻 cs.CV

Bag of Freebies for Training Object Detection Neural Networks

Zhi Zhang , Tong He , Hang Zhang , Zhongyue Zhang , Junyuan Xie , Mu Li This is my paper

classification 💻 cs.CV

keywords trainingmodelsdetectionfreebieshoweverimprovemodelneural

0 comments

read the original abstract

Training heuristics greatly improve various image classification model accuracies~\cite{he2018bag}. Object detection models, however, have more complex neural network structures and optimization targets. The training strategies and pipelines dramatically vary among different models. In this works, we explore training tweaks that apply to various models including Faster R-CNN and YOLOv3. These tweaks do not change the model architectures, therefore, the inference costs remain the same. Our empirical results demonstrate that, however, these freebies can improve up to 5% absolute precision compared to state-of-the-art baselines.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

YOLOX: Exceeding YOLO Series in 2021
cs.CV 2021-07 accept novelty 6.0

YOLOX exceeds prior YOLO models by adopting anchor-free detection, decoupled heads, and SimOTA assignment to reach 50.0% AP on COCO for the large variant.
Centralized Copy-Paste: Enhanced Data Augmentation Strategy for Wildland Fire Semantic Segmentation
cs.CV 2025-07 unverdicted novelty 5.0

CCPDA augments training data for wildland fire semantic segmentation by centralizing and pasting fire clusters, outperforming standard augmentations on fire-class metrics via multi-objective optimization.
A unified neural network for object detection, multiple object tracking and vehicle re-identification
cs.CV 2019-07 unverdicted novelty 3.0

Faster RCNN is extended with a track branch and trained end-to-end on concatenated video frames to unify detection and re-identification, reaching 57.79% mAP on the AIC19 vehicle dataset.