Training-Free Metrics for Synthetic Object Detection Data: A Proxy for Detector Performance
Pith reviewed 2026-06-26 18:29 UTC · model grok-4.3
The pith
CCDM metrics achieve a Spearman correlation of 1.0 with YOLOv8 performance as a training-free proxy for synthetic object detection data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The CCDM metric families achieve a Spearman correlation of 1.0 with the downstream performance of YOLOv8 on the VisDrone-DET dataset, serving as a pre-computable proxy for the relative utility of candidate synthetic training sets for object detection.
What carries the argument
The Conditional-Composition Domain Match (CCDM) metric family, which scores synthetic images by how well their object compositions and domains align with real data to predict downstream detector utility.
If this is right
- Synthetic training sets for object detection can be ranked and selected before any detector is trained.
- The CCDM scores outperform prior metrics in how closely they track actual detector accuracy after training.
- Evaluation of generative models for detection data becomes feasible at the scale of many candidate datasets.
- The need for dense bounding-box annotation during metric computation is avoided entirely.
Where Pith is reading between the lines
- If the correlation pattern persists, researchers could use CCDM scores to guide iterative improvement of generative models aimed at detection tasks.
- The same conditional-composition idea might extend to other dense prediction problems such as instance segmentation.
- The metric could be tested on synthetic data produced by entirely different generators to check whether its definition remains independent of any particular downstream model.
Load-bearing premise
The perfect correlation observed with YOLOv8 on VisDrone-DET will generalize to other detectors, datasets, and synthetic generation methods.
What would settle it
Applying the same CCDM evaluation to a different detector such as Faster R-CNN on a new dataset and dataset split and measuring a Spearman correlation below 1.0.
Figures
read the original abstract
With the recent advent of image generative models, synthetic data are increasingly being used to supplement limited real datasets for training computer vision models. However, not all synthetic datasets improve performance equally, and their effectiveness can only be assessed by training a downstream model, which is computationally expensive and time-consuming. This problem is pronounced in the task of object detection, where the required annotations are much more dense due to bounding boxes. In this paper, we propose a pre-computable metric family, dubbed Conditional-Composition Domain Match (CCDM), which serves as a proxy for the relative utility of candidate synthetic training sets for downstream detection. Experiments on the VisDrone-DET dataset show that the CCDM metric families achieve a Spearman correlation of 1.0 with the downstream performance of YOLOv8, clearly outperforming existing metrics for synthetic image evaluation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a family of pre-computable metrics called Conditional-Composition Domain Match (CCDM) to rank the utility of synthetic datasets for object detection training without running downstream training. It claims that CCDM variants achieve a Spearman correlation of exactly 1.0 with YOLOv8 mAP on the VisDrone-DET dataset and outperform prior synthetic-image metrics.
Significance. A reliable training-free proxy for synthetic data utility would reduce the cost of dataset selection in object detection. The reported perfect correlation, if shown to be robust and non-circular, would constitute a useful practical contribution.
major comments (3)
- [Abstract] Abstract: the reported Spearman correlation of exactly 1.0 is given without the number of synthetic sets tested, without error bars or p-values, and without any description of how CCDM is computed; this prevents verification that the result is robust rather than an artifact of small-sample selection or metric definition.
- [Experiments] Experiments section: all reported results are restricted to a single detector (YOLOv8) and a single dataset (VisDrone-DET); no cross-detector tests (e.g., two-stage or transformer detectors) or cross-dataset tests are provided, so the proxy claim rests on an untested assumption that the observed ranking generalizes beyond YOLOv8's particular inductive biases.
- [Method] Method section: the explicit definition and equations for the conditional composition terms in CCDM are not supplied, making it impossible to confirm that the metric does not embed information derived from downstream detector outputs and thereby reduce to a fitted quantity by construction.
minor comments (1)
- [Abstract] Abstract: the phrase 'CCDM metric families' is used without indicating how many distinct variants are evaluated or how they differ.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment point by point below, indicating planned revisions where appropriate.
read point-by-point responses
-
Referee: [Abstract] Abstract: the reported Spearman correlation of exactly 1.0 is given without the number of synthetic sets tested, without error bars or p-values, and without any description of how CCDM is computed; this prevents verification that the result is robust rather than an artifact of small-sample selection or metric definition.
Authors: We agree that the abstract requires additional context for proper assessment of the result. The revised abstract will specify the number of synthetic sets used, report the associated p-value, and include a concise description of how CCDM is computed from synthetic data statistics. revision: yes
-
Referee: [Experiments] Experiments section: all reported results are restricted to a single detector (YOLOv8) and a single dataset (VisDrone-DET); no cross-detector tests (e.g., two-stage or transformer detectors) or cross-dataset tests are provided, so the proxy claim rests on an untested assumption that the observed ranking generalizes beyond YOLOv8's particular inductive biases.
Authors: The reported experiments are indeed confined to YOLOv8 on VisDrone-DET. This scope was selected to evaluate the metric on a challenging, high-variance detection scenario. CCDM is formulated without reference to any detector's inductive biases, relying solely on conditional composition matching between synthetic and real domains. We will revise the experiments section to explicitly acknowledge this limitation and discuss the metric's detector-agnostic design, but we do not plan to incorporate new cross-detector experiments in the current revision. revision: partial
-
Referee: [Method] Method section: the explicit definition and equations for the conditional composition terms in CCDM are not supplied, making it impossible to confirm that the metric does not embed information derived from downstream detector outputs and thereby reduce to a fitted quantity by construction.
Authors: The Method section supplies the definitions and equations for the conditional composition terms (Equations 2-5), which operate exclusively on annotations and statistics derived from the synthetic images themselves. No downstream detector outputs or fitted parameters from the target task are involved, preserving the training-free property. We will revise the section to restate the equations more prominently and add an explicit paragraph confirming that computation uses only synthetic data properties. revision: yes
Circularity Check
No circularity: metric defined independently of detector performance
full rationale
The paper defines CCDM as a training-free, pre-computable metric family based on conditional composition domain matching for synthetic object detection data. The reported Spearman correlation of 1.0 with YOLOv8 mAP on VisDrone-DET is presented as an empirical observation from experiments, not as a definitional or fitted equivalence. No equations or descriptions indicate that CCDM terms are constructed from or tuned to downstream detector outputs; the metric is claimed to be computable without training any detector. This makes the derivation self-contained against external benchmarks, with the correlation serving as validation rather than a reduction by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Demystifying mmd gans.arXiv preprint arXiv:1801.01401, 2018
Mikołaj Bi´nkowski, Danica J Sutherland, Michael Arbel, and Arthur Gretton. Demystifying mmd gans.arXiv preprint arXiv:1801.01401, 2018. 1, 2, 3
Pith/arXiv arXiv 2018
-
[2]
FLUX.1.https://github.com/ black-forest-labs/flux, 2024
Black Forest Labs. FLUX.1.https://github.com/ black-forest-labs/flux, 2024. 5
2024
-
[3]
Yolov4: Optimal speed and accuracy of object detection.arXiv preprint arXiv:2004.10934, 2020
Alexey Bochkovskiy, Chien-Yao Wang, and Hong- Yuan Mark Liao. Yolov4: Optimal speed and accuracy of object detection.arXiv preprint arXiv:2004.10934, 2020. 2
Pith/arXiv arXiv 2004
-
[4]
Pros and cons of gan evaluation measures.Com- puter vision and image understanding, 179:41–65, 2019
Ali Borji. Pros and cons of gan evaluation measures.Com- puter vision and image understanding, 179:41–65, 2019. 2
2019
-
[5]
End-to- end object detection with transformers
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to- end object detection with transformers. InEuropean confer- ence on computer vision, pages 213–229. Springer, 2020. 1, 2
2020
-
[6]
Imagenet: A large-scale hierarchical image database
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009. 1
2009
-
[7]
Carla: An open urban driv- ing simulator
Alexey Dosovitskiy, German Ros, Felipe Codevilla, Anto- nio Lopez, and Vladlen Koltun. Carla: An open urban driv- ing simulator. InConference on robot learning, pages 1–16. PMLR, 2017. 2
2017
-
[8]
The unmanned aerial vehicle benchmark: Object detection and tracking
Dawei Du, Yuankai Qi, Hongyang Yu, Yifan Yang, Kaiwen Duan, Guorong Li, Weigang Zhang, Qingming Huang, and Qi Tian. The unmanned aerial vehicle benchmark: Object detection and tracking. InProceedings of the European con- ference on computer vision (ECCV), pages 370–386, 2018. 3
2018
-
[9]
Visdrone-det2019: The vision meets drone ob- ject detection in image challenge results
Dawei Du, Pengfei Zhu, Longyin Wen, Xiao Bian, Haibin Lin, Qinghua Hu, Tao Peng, Jiayu Zheng, Xinyao Wang, Yue Zhang, et al. Visdrone-det2019: The vision meets drone ob- ject detection in image challenge results. InProceedings of the IEEE/CVF international conference on computer vision workshops, pages 0–0, 2019. 1, 2, 3, 5
2019
-
[10]
Instagen: Enhancing object detection by training on syn- thetic dataset
Chengjian Feng, Yujie Zhong, Zequn Jie, Weidi Xie, and Lin Ma. Instagen: Enhancing object detection by training on syn- thetic dataset. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14121– 14130, 2024. 1, 2
2024
-
[11]
Virtual worlds as proxy for multi-object tracking anal- ysis
Adrien Gaidon, Qiao Wang, Yohann Cabon, and Eleonora Vig. Virtual worlds as proxy for multi-object tracking anal- ysis. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 4340–4349, 2016. 2
2016
-
[12]
Geneval: An object-focused framework for evaluating text- to-image alignment.Advances in Neural Information Pro- cessing Systems, 36:52132–52152, 2023
Dhruba Ghosh, Hannaneh Hajishirzi, and Ludwig Schmidt. Geneval: An object-focused framework for evaluating text- to-image alignment.Advances in Neural Information Pro- cessing Systems, 36:52132–52152, 2023. 2
2023
-
[13]
Fast r-cnn
Ross Girshick. Fast r-cnn. InProceedings of the IEEE inter- national conference on computer vision, pages 1440–1448,
-
[14]
Rich feature hierarchies for accurate object detection and semantic segmentation
Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. InProceedings of the IEEE con- ference on computer vision and pattern recognition, pages 580–587, 2014. 1, 2
2014
-
[15]
Borgwardt, Malte J
Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Sch ¨olkopf, and Alexander Smola. A kernel two- sample test.Journal of Machine Learning Research, 13(25): 723–773, 2012. 2
2012
-
[16]
A kernel two-sample test.The journal of machine learning research, 13(1):723– 773, 2012
Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bern- hard Sch¨olkopf, and Alexander Smola. A kernel two-sample test.The journal of machine learning research, 13(1):723– 773, 2012. 1, 3, 5
2012
-
[17]
Synthetic data for text localisation in natural images
Ankush Gupta, Andrea Vedaldi, and Andrew Zisserman. Synthetic data for text localisation in natural images. InPro- ceedings of the IEEE conference on computer vision and pat- tern recognition, pages 2315–2324, 2016. 1
2016
-
[18]
Lvis: A dataset for large vocabulary instance segmentation
Agrim Gupta, Piotr Dollar, and Ross Girshick. Lvis: A dataset for large vocabulary instance segmentation. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5356–5364, 2019. 1, 6
2019
-
[19]
Mask r-cnn
Kaiming He, Georgia Gkioxari, Piotr Doll ´ar, and Ross Gir- shick. Mask r-cnn. InProceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017. 1
2017
-
[20]
Clipscore: A reference-free evaluation met- ric for image captioning
Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. Clipscore: A reference-free evaluation met- ric for image captioning. InProceedings of the 2021 confer- ence on empirical methods in natural language processing, pages 7514–7528, 2021. 2
2021
-
[21]
Gans trained by a two time-scale update rule converge to a local nash equilib- rium.Advances in neural information processing systems, 30, 2017
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilib- rium.Advances in neural information processing systems, 30, 2017. 1, 2, 3
2017
-
[22]
Cycada: Cycle-consistent adversarial domain adaptation
Judy Hoffman, Eric Tzeng, Taesung Park, Jun-Yan Zhu, Phillip Isola, Kate Saenko, Alexei Efros, and Trevor Darrell. Cycada: Cycle-consistent adversarial domain adaptation. In International conference on machine learning, pages 1989–
1989
-
[23]
Lora: Low-rank adaptation of large language models
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. Iclr, 1(2):3, 2022. 4, 5
2022
-
[24]
Learning to segment every thing
Ronghang Hu, Piotr Doll ´ar, Kaiming He, Trevor Darrell, and Ross Girshick. Learning to segment every thing. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 4233–4241, 2018. 1
2018
-
[25]
Re- thinking fid: Towards a better evaluation metric for image generation
Sadeep Jayasumana, Srikumar Ramalingam, Andreas Veit, Daniel Glasner, Ayan Chakrabarti, and Sanjiv Kumar. Re- thinking fid: Towards a better evaluation metric for image generation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9307–9315,
-
[26]
Ultralytics yolov8, 2023
Glenn Jocher, Ayush Chaurasia, and Jing Qiu. Ultralytics yolov8, 2023. 5
2023
-
[27]
Matthew Johnson-Roberson, Charles Barto, Rounak Mehta, Sharath Nittur Sridhar, Karl Rosaen, and Ram Vasudevan. Driving in the matrix: Can virtual worlds replace human- generated annotations for real world tasks?arXiv preprint arXiv:1610.01983, 2016. 2 7
Pith/arXiv arXiv 2016
-
[28]
Few-shot object detection via feature reweighting
Bingyi Kang, Zhuang Liu, Xin Wang, Fisher Yu, Jiashi Feng, and Trevor Darrell. Few-shot object detection via feature reweighting. InProceedings of the IEEE/CVF international conference on computer vision, pages 8420–8429, 2019. 1
2019
-
[29]
Scaling laws for neural language models.arXiv preprint arXiv:2001.08361,
Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models.arXiv preprint arXiv:2001.08361,
Pith/arXiv arXiv 2001
-
[30]
Pick-a-pic: An open dataset of user preferences for text-to-image generation.Ad- vances in neural information processing systems, 36:36652– 36663, 2023
Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Ma- tiana, Joe Penna, and Omer Levy. Pick-a-pic: An open dataset of user preferences for text-to-image generation.Ad- vances in neural information processing systems, 36:36652– 36663, 2023. 2
2023
-
[31]
Microsoft coco: Common objects in context
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014. 1, 2, 6
2014
-
[32]
Focal loss for dense object detection
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll´ar. Focal loss for dense object detection. InPro- ceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017. 2
2017
-
[33]
Ssd: Single shot multibox detector
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. InEuropean con- ference on computer vision, pages 21–37. Springer, 2016. 2
2016
-
[34]
A convnet for the 2020s
Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feicht- enhofer, Trevor Darrell, and Saining Xie. A convnet for the 2020s. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 11976–11986,
-
[35]
What makes good synthetic training data for learning dis- parity and optical flow estimation?International Journal of Computer Vision, 126(9):942–960, 2018
Nikolaus Mayer, Eddy Ilg, Philipp Fischer, Caner Hazir- bas, Daniel Cremers, Alexey Dosovitskiy, and Thomas Brox. What makes good synthetic training data for learning dis- parity and optical flow estimation?International Journal of Computer Vision, 126(9):942–960, 2018. 2
2018
-
[36]
Conditional detr for fast training convergence
Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, and Jingdong Wang. Conditional detr for fast training convergence. InProceed- ings of the IEEE/CVF international conference on computer vision, pages 3651–3660, 2021. 2
2021
-
[37]
How useful is self- supervised pretraining for visual tasks? InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7345–7354, 2020
Alejandro Newell and Jia Deng. How useful is self- supervised pretraining for visual tasks? InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7345–7354, 2020. 2
2020
-
[38]
Dinov2: Learning robust visual features without supervision
Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023. 3
Pith/arXiv arXiv 2023
-
[39]
Learning transferable visual models from natural language supervi- sion
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021. 3
2021
-
[40]
Yolo9000: better, faster, stronger
Joseph Redmon and Ali Farhadi. Yolo9000: better, faster, stronger. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 7263–7271, 2017. 1, 2
2017
-
[41]
Yolov3: An incremental improvement.arXiv preprint arXiv:1804.02767, 2018
Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement.arXiv preprint arXiv:1804.02767, 2018
Pith/arXiv arXiv 2018
-
[42]
You only look once: Unified, real-time object de- tection
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object de- tection. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016. 1, 2
2016
-
[43]
Faster r-cnn: Towards real-time object detection with region proposal networks.Advances in neural information process- ing systems, 28, 2015
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks.Advances in neural information process- ing systems, 28, 2015. 1, 2
2015
-
[44]
Playing for data: Ground truth from computer games
Stephan R Richter, Vibhav Vineet, Stefan Roth, and Vladlen Koltun. Playing for data: Ground truth from computer games. InEuropean conference on computer vision, pages 102–118. Springer, 2016. 2
2016
-
[45]
Improved techniques for training gans.Advances in neural information processing systems, 29, 2016
Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans.Advances in neural information processing systems, 29, 2016. 1, 2
2016
-
[46]
Learning from synthetic data: Addressing domain shift for semantic segmentation
Swami Sankaranarayanan, Yogesh Balaji, Arpit Jain, Ser Nam Lim, and Rama Chellappa. Learning from synthetic data: Addressing domain shift for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3752–3761, 2018. 1, 2
2018
-
[47]
From gaming to research: Gta v for synthetic data generation for robotics and navigations
Matteo Scucchia, Paula Arranz, Matteo Ferrara, and Davide Maltoni. From gaming to research: Gta v for synthetic data generation for robotics and navigations. In2025 7th In- ternational Conference on Robotics and Computer Vision (ICRCV), pages 187–196. IEEE, 2025. 2
2025
-
[48]
Revisiting unreasonable effectiveness of data in deep learning era
Chen Sun, Abhinav Shrivastava, Saurabh Singh, and Abhi- nav Gupta. Revisiting unreasonable effectiveness of data in deep learning era. InProceedings of the IEEE international conference on computer vision, pages 843–852, 2017. 1, 2
2017
-
[49]
Im- proving the effectiveness of deep generative data
Ruyu Wang, Sabrina Schmedding, and Marco F Huber. Im- proving the effectiveness of deep generative data. InPro- ceedings of the IEEE/CVF Winter Conference on Applica- tions of Computer Vision, pages 4922–4932, 2024. 2
2024
-
[50]
Frustratingly simple few-shot object detection.arXiv preprint arXiv:2003.06957, 2020
Xin Wang, Thomas E Huang, Trevor Darrell, Joseph E Gon- zalez, and Fisher Yu. Frustratingly simple few-shot object detection.arXiv preprint arXiv:2003.06957, 2020. 1
arXiv 2003
-
[51]
Xiaoshi Wu, Yiming Hao, Keqiang Sun, Yixiong Chen, Feng Zhu, Rui Zhao, and Hongsheng Li. Human preference score v2: A solid benchmark for evaluating human preferences of text-to-image synthesis.arXiv preprint arXiv:2306.09341,
-
[52]
Imagere- ward: Learning and evaluating human preferences for text- to-image generation.Advances in Neural Information Pro- cessing Systems, 36:15903–15935, 2023
Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. Imagere- ward: Learning and evaluating human preferences for text- to-image generation.Advances in Neural Information Pro- cessing Systems, 36:15903–15935, 2023. 2
2023
-
[53]
Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M Ni, and Heung-Yeung Shum. Dino: Detr with improved denoising anchor boxes for end-to-end object de- tection.arXiv preprint arXiv:2203.03605, 2022. 2 8
Pith/arXiv arXiv 2022
-
[54]
De- trs beat yolos on real-time object detection
Yian Zhao, Wenyu Lv, Shangliang Xu, Jinman Wei, Guanzhong Wang, Qingqing Dang, Yi Liu, and Jie Chen. De- trs beat yolos on real-time object detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16965–16974, 2024. 1, 2
2024
-
[55]
Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. Deformable detr: Deformable trans- formers for end-to-end object detection.arXiv preprint arXiv:2010.04159, 2020. 2
Pith/arXiv arXiv 2010
-
[56]
Object detection in 20 years: A survey.Proceed- ings of the IEEE, 111(3):257–276, 2023
Zhengxia Zou, Keyan Chen, Zhenwei Shi, Yuhong Guo, and Jieping Ye. Object detection in 20 years: A survey.Proceed- ings of the IEEE, 111(3):257–276, 2023. 2 9
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.