Recognition: 2 theorem links
· Lean TheoremVisual Prototype Conditioned Focal Region Generation for UAV-Based Object Detection
Pith reviewed 2026-05-13 20:53 UTC · model grok-4.3
The pith
UAVGen generates higher-fidelity synthetic images for UAV object detection by conditioning diffusion models on visual class prototypes and emphasizing focal regions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
UAVGen designs a Visual Prototype Conditioned Diffusion Model (VPC-DM) that constructs representative instances for each class and integrates them into latent embeddings for high-fidelity object generation. It pairs this with a Focal Region Enhanced Data Pipeline (FRE-DP) that emphasizes object-concentrated foreground regions in synthesis, combined with a label refinement step to correct missing, extra and misaligned generations.
What carries the argument
Visual Prototype Conditioned Diffusion Model (VPC-DM) that embeds class-representative object instances into latent space for generation, together with Focal Region Enhanced Data Pipeline (FRE-DP) for foreground focus and label correction.
Load-bearing premise
The synthetic images produced by prototype conditioning and focal-region refinement have a distribution close enough to real UAV photos that they do not introduce biases or artifacts harmful to downstream detector training.
What would settle it
Measure whether detectors trained on real UAV data plus UAVGen images achieve the reported accuracy gains over real data alone when evaluated on a large held-out set of genuine UAV images.
Figures
read the original abstract
Unmanned aerial vehicle (UAV) based object detection is a critical but challenging task, when applied in dynamically changing scenarios with limited annotated training data. Layout-to-image generation approaches have proved effective in promoting detection accuracy by synthesizing labeled images based on diffusion models. However, they suffer from frequently producing artifacts, especially near layout boundaries of tiny objects, thus substantially limiting their performance. To address these issues, we propose UAVGen, a novel layout-to-image generation framework tailored for UAV-based object detection. Specifically, UAVGen designs a Visual Prototype Conditioned Diffusion Model (VPC-DM) that constructs representative instances for each class and integrates them into latent embeddings for high-fidelity object generation. Moreover, a Focal Region Enhanced Data Pipeline (FRE-DP) is introduced to emphasize object-concentrated foreground regions in synthesis, combined with a label refinement to correct missing, extra and misaligned generations. Extensive experimental results demonstrate that our method significantly outperforms state-of-the-art approaches, and consistently promotes accuracy when integrated with distinct detectors. The source code is available at https://github.com/Sirius-Li/UAVGen.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes UAVGen, a layout-to-image generation framework for UAV-based object detection under limited annotated data. It introduces the Visual Prototype Conditioned Diffusion Model (VPC-DM) that constructs class-representative instances and integrates them into latent embeddings for high-fidelity synthesis, along with the Focal Region Enhanced Data Pipeline (FRE-DP) that emphasizes foreground regions and applies label refinement to correct missing, extra, or misaligned objects. The central claim is that extensive experiments show significant outperformance over state-of-the-art methods and consistent accuracy gains when the generated data is used to train distinct detectors.
Significance. If the reported gains hold under rigorous validation, the work could meaningfully advance synthetic data augmentation for UAV object detection by reducing artifacts near tiny objects and improving distribution match to real aerial imagery. The open availability of code is a clear strength that supports reproducibility and enables direct testing of the pipeline's effect on downstream detectors.
minor comments (2)
- Abstract: The claim of 'significantly outperforms state-of-the-art approaches' would be more informative if accompanied by at least one concrete metric (e.g., mAP improvement on a named UAV dataset) rather than remaining purely qualitative.
- Method description: The integration of visual prototypes into latent embeddings (VPC-DM) and the precise mechanism of focal-region emphasis plus label refinement (FRE-DP) would benefit from an explicit statement of how these steps are combined in the overall training objective or inference schedule.
Simulated Author's Rebuttal
We thank the referee for the positive summary, significance assessment, and recommendation for minor revision. We are pleased that the contributions of UAVGen, including VPC-DM and FRE-DP, are viewed as potentially advancing synthetic data augmentation for UAV object detection. No specific major comments were raised in the report, so we interpret the minor revision request as an opportunity to polish presentation and add any clarifying details where helpful.
Circularity Check
No significant circularity detected
full rationale
The paper proposes a new layout-to-image generation framework (UAVGen) consisting of VPC-DM for visual prototype conditioning in diffusion models and FRE-DP for focal region emphasis with label refinement. The central performance claims rest on empirical experiments showing outperformance over SOTA and gains when integrated with detectors. No derivation chain, equations, or fitted parameters are presented that reduce the outputs to inputs by construction. No self-citations, uniqueness theorems, or ansatzes are invoked in the provided text to justify core components. The method is self-contained as a technical proposal with falsifiable code, yielding no circular steps.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Diffusion models can produce high-fidelity object instances when conditioned on representative visual prototypes
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
UAVGen designs a Visual Prototype Conditioned Diffusion Model (VPC-DM) that constructs representative instances for each class and integrates them into latent embeddings for high-fidelity object generation. Moreover, a Focal Region Enhanced Data Pipeline (FRE-DP) is introduced to emphasize object-concentrated foreground regions in synthesis, combined with a label refinement to correct missing, extra and misaligned generations.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Extensive experimental results demonstrate that our method significantly outperforms state-of-the-art approaches, and consistently promotes accuracy when integrated with distinct detectors.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Multidiffusion: Fusing diffusion paths for controlled image generation
Omer Bar-Tal, Lior Yariv, Yaron Lipman, and Tali Dekel. Multidiffusion: Fusing diffusion paths for controlled image generation. InProceedings of the International Conference on Machine Learning, pages 1737–1752, 2023. 2
work page 2023
-
[2]
Large scale gan training for high fidelity natural image synthesis
Andrew Brock, Jeff Donahue, and Karen Simonyan. Large scale gan training for high fidelity natural image synthesis. InProceedings of the International Conference on Learning Representations, pages 9256–9291, 2018. 2
work page 2018
-
[3]
Geodiffusion: Text- prompted geometric control for object detection data gen- eration
Kai Chen, Enze Xie, Zhe Chen, Yibo Wang, Lanqing Hong, Zhenguo Li, and Dit-Yan Yeung. Geodiffusion: Text- prompted geometric control for object detection data gen- eration. InProceedings of the International Conference on Learning Representations, pages 846–868, 2024. 1, 2, 6, 7
work page 2024
-
[4]
Jiaxin Cheng, Xiao Liang, Xingjian Shi, Tong He, Tianjun Xiao, and Mu Li. Layoutdiffuse: Adapting foundational dif- fusion models for layout-to-image generation.arXiv preprint arXiv:2302.08908, 2023. 2
-
[5]
Diffusion models beat gans on image synthesis
Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis. InAdvances in Neural Infor- mation Processing Systems, pages 4643–4651, 2025. 2
work page 2025
-
[6]
Tamara Regina Dieter, Andreas Weinmann, Stefan J¨ager, and Eva Brucherseifer. Quantifying the simulation–reality gap for deep learning-based drone detection.Electronics, 12(10): 2197, 2023. 1
work page 2023
-
[7]
Bowei Du, Yecheng Huang, Jiaxin Chen, and Di Huang. Adaptive sparse convolutional networks with global context enhancement for faster object detection on drone images. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 13435–13444, 2023. 1, 2
work page 2023
-
[8]
The unmanned aerial vehicle benchmark: Object detection and tracking
Dawei Du, Yuankai Qi, Hongyang Yu, Yifan Yang, Kaiwen Duan, Guorong Li, Weigang Zhang, Qingming Huang, and Qi Tian. The unmanned aerial vehicle benchmark: Object detection and tracking. InProceedings of the European Con- ference on Computer Vision, pages 370–386, 2018. 1, 6, 2
work page 2018
-
[9]
Visdrone-det2019: The vision meets drone ob- ject detection in image challenge results
Dawei Du, Pengfei Zhu, Longyin Wen, Xiao Bian, Haibin Lin, Qinghua Hu, Tao Peng, Jiayu Zheng, Xinyao Wang, Yue Zhang, et al. Visdrone-det2019: The vision meets drone ob- ject detection in image challenge results. InProceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pages 213–226, 2019. 1, 6, 2
work page 2019
-
[10]
Mod- eling visual context is key to augmenting object detection datasets
Nikita Dvornik, Julien Mairal, and Cordelia Schmid. Mod- eling visual context is key to augmenting object detection datasets. InProceedings of the European Conference on Computer Vision, pages 364–380, 2018. 3
work page 2018
-
[11]
Cut, paste and learn: Surprisingly easy synthesis for instance de- tection
Debidatta Dwibedi, Ishan Misra, and Martial Hebert. Cut, paste and learn: Surprisingly easy synthesis for instance de- tection. InProceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 1301–1310, 2017. 1, 3, 6
work page 2017
-
[12]
Milan Erdelj, Enrico Natalizio, Kaushik R Chowdhury, and Ian F Akyildiz. Help from the sky: Leveraging uavs for dis- aster management.IEEE Pervasive Computing, 16(1):24– 32, 2017. 1
work page 2017
-
[13]
Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. The pascal visual object classes (voc) challenge.International Journal of Computer Vision, 88(2):303–338, 2010. 6
work page 2010
-
[14]
Magicdrive: Street view generation with diverse 3d geometry control
Ruiyuan Gao, Kai Chen, Enze Xie, Lanqing HONG, Zhen- guo Li, Dit-Yan Yeung, and Qiang Xu. Magicdrive: Street view generation with diverse 3d geometry control. InPro- ceedings of the International Conference on Learning Rep- resentations, pages 904–923, 2024. 1, 3
work page 2024
-
[15]
Simple copy-paste is a strong data augmentation method for instance segmentation
Golnaz Ghiasi, Yin Cui, Aravind Srinivas, Rui Qian, Tsung- Yi Lin, Ekin D Cubuk, Quoc V Le, and Barret Zoph. Simple copy-paste is a strong data augmentation method for instance segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2918– 2928, 2021. 1, 3
work page 2021
-
[16]
Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. InAdvances in Neural Information Processing Systems, pages 2672–2680,
-
[17]
Gans trained by a two time-scale update rule converge to a local nash equilib- rium
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilib- rium. InAdvances in Neural Information Processing Sys- tems, page 6629–6640, 2017. 6
work page 2017
-
[18]
Denoising dif- fusion probabilistic models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif- fusion probabilistic models. InAdvances in Neural Informa- tion Processing Systems, pages 6840–6851, 2020. 2
work page 2020
-
[19]
Eija Honkavaara, Heikki Saari, Jere Kaivosoja, Ilkka P¨ol¨onen, Teemu Hakala, Paula Litkey, Jussi M ¨akynen, and Liisa Pesonen. Processing and assessment of spectrometric, stereoscopic imagery collected using a lightweight uav spec- tral camera for precision agriculture.Remote Sensing, 5(10): 5006–5039, 2013. 1
work page 2013
-
[20]
Hailong Huang, Andrey V Savkin, and Chao Huang. Decen- tralized autonomous navigation of a uav network for road traffic monitoring.IEEE Transactions on Aerospace and Electronic Systems, 57(4):2558–2564, 2021. 1
work page 2021
-
[21]
Ufpmp-det: Toward accurate and efficient object detection on drone im- agery
Yecheng Huang, Jiaxin Chen, and Di Huang. Ufpmp-det: Toward accurate and efficient object detection on drone im- agery. InProceedings of the AAAI conference on Artificial Intelligence, pages 1026–1033, 2022. 1, 2
work page 2022
-
[22]
High- resolution complex scene synthesis with transformers
Manuel Jahn, Robin Rombach, and Bj ¨orn Ommer. High- resolution complex scene synthesis with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pages 7054–7065, 2021. 2
work page 2021
-
[23]
A style-based generator architecture for generative adversarial networks
Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. 9 InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4401–4410, 2019. 2
work page 2019
-
[24]
YOLOv11: An Overview of the Key Architectural Enhancements
Rahima Khanam and Muhammad Hussain. Yolov11: An overview of the key architectural enhancements.arXiv preprint arXiv:2410.17725, 2024. 2
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[25]
Skyscenes: A syn- thetic dataset for aerial scene understanding
Sahil Khose, Anisha Pal, Aayushi Agarwal, Deepanshi, Judy Hoffman, and Prithvijit Chattopadhyay. Skyscenes: A syn- thetic dataset for aerial scene understanding. InProceedings of the European Conference on Computer Vision, pages 19– 35, 2024. 3
work page 2024
-
[26]
Diederik Kingma, Tim Salimans, Ben Poole, and Jonathan Ho. Variational diffusion models. InAdvances in Neural Information Processing Systems, pages 21696–21707, 2021. 2
work page 2021
-
[27]
Auto-Encoding Variational Bayes
Diederik P Kingma and Max Welling. Auto-encoding varia- tional bayes.arXiv preprint arXiv:1312.6114, 2013. 2
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[28]
Remdet: Rethinking efficient model design for uav ob- ject detection
Chen Li, Rui Zhao, Zeyu Wang, Huiying Xu, and Xinzhong Zhu. Remdet: Rethinking efficient model design for uav ob- ject detection. InProceedings of the AAAI Conference on Artificial Intelligence, pages 4643–4651, 2025. 1, 2, 7
work page 2025
-
[29]
Trackdif- fusion: Tracklet-conditioned video generation via diffusion models
Pengxiang Li, Kai Chen, Zhili Liu, Ruiyuan Gao, Lanqing Hong, Dit-Yan Yeung, Huchuan Lu, and Xu Jia. Trackdif- fusion: Tracklet-conditioned video generation via diffusion models. InProceedings of the IEEE/CVF Winter Confer- ence on Applications of Computer Vision, pages 3539–3548,
-
[30]
Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection
Xiang Li, Wenhai Wang, Lijun Wu, Shuo Chen, Xiaolin Hu, Jun Li, Jinhui Tang, and Jian Yang. Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection. InAdvances in Neural Information Pro- cessing Systems, pages 21002–21012, 2020. 1, 2, 7
work page 2020
-
[31]
Gligen: Open-set grounded text-to-image generation
Yuheng Li, Haotian Liu, Qingyang Wu, Fangzhou Mu, Jian- wei Yang, Jianfeng Gao, Chunyuan Li, and Yong Jae Lee. Gligen: Open-set grounded text-to-image generation. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22511–22521, 2023. 2, 4, 6, 7
work page 2023
-
[32]
Image synthesis from layout with locality- aware mask adaption
Zejian Li, Jingyu Wu, Immanuel Koh, Yongchuan Tang, and Lingyun Sun. Image synthesis from layout with locality- aware mask adaption. InProceedings of the IEEE/CVF In- ternational Conference on Computer Vision, pages 13819– 13828, 2021. 2
work page 2021
-
[33]
Microsoft coco: Common objects in context
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vi- sion, pages 740–755, 2014. 6
work page 2014
-
[34]
Improved denoising diffusion probabilistic models
Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. InProceedings of the International Conference on Machine Learning, pages 8162–8171, 2021. 2
work page 2021
-
[35]
YOLOv3: An Incremental Improvement
Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement.arXiv preprint arXiv:1804.02767, 2018. 2, 1
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[36]
You only look once: Unified, real-time object detec- tion
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detec- tion. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 779–788, 2016. 1, 2
work page 2016
-
[37]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks.IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6):1137–1149, 2016. 6, 1
work page 2016
-
[38]
Stochastic backpropagation and approximate inference in deep generative models
Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wier- stra. Stochastic backpropagation and approximate inference in deep generative models. InProceedings of the Interna- tional Conference on Machine Learning, pages 1278–1286,
-
[39]
Syndrone-multi-modal uav dataset for ur- ban scenarios
Giulia Rizzoli, Francesco Barbato, Matteo Caligiuri, and Pietro Zanuttigh. Syndrone-multi-modal uav dataset for ur- ban scenarios. InProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pages 2210–2220,
-
[40]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022. 2
work page 2022
-
[41]
Francisco Rovira-M ´as, Qin Zhang, and John F Reid. Stereo vision three-dimensional terrain maps for precision agricul- ture.Computers and Electronics in Agriculture, 60(2):133– 143, 2008. 1
work page 2008
-
[42]
Progressive transformation learning for leveraging virtual images in training
Yi-Ting Shen, Hyungtae Lee, Heesung Kwon, and Shu- vra S Bhattacharyya. Progressive transformation learning for leveraging virtual images in training. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 835–844, 2023. 3
work page 2023
-
[43]
Deep unsupervised learning using nonequilibrium thermodynamics
Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. InProceedings of the In- ternational Conference on Machine Learning, pages 2256– 2265, 2015. 2
work page 2015
-
[44]
Denois- ing diffusion implicit models
Jiaming Song, Chenlin Meng, and Stefano Ermon. Denois- ing diffusion implicit models. InProceedings of the In- ternational Conference on Learning Representations, pages 14205–14224, 2021
work page 2021
-
[45]
Improved techniques for training score-based generative models
Yang Song and Stefano Ermon. Improved techniques for training score-based generative models. InAdvances in Neu- ral Information Processing Systems, pages 12438–12448,
-
[46]
Aerogen: Enhancing remote sensing object detection with diffusion-driven data generation
Datao Tang, Xiangyong Cao, Xuan Wu, Jialin Li, Jing Yao, Xueru Bai, Dongsheng Jiang, Yin Li, and Deyu Meng. Aerogen: Enhancing remote sensing object detection with diffusion-driven data generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3614–3624, 2025. 1, 3, 6, 7
work page 2025
-
[47]
YOLOv12: Attention-Centric Real-Time Object Detectors
Yunjie Tian, Qixiang Ye, and David Doermann. Yolov12: Attention-centric real-time object detectors.arXiv preprint arXiv:2502.12524, 2025. 2
work page internal anchor Pith review arXiv 2025
-
[48]
Satsynth: Augmenting image-mask pairs through diffusion models for aerial semantic segmentation
Aysim Toker, Marvin Eisenberger, Daniel Cremers, and Laura Leal-Taix´e. Satsynth: Augmenting image-mask pairs through diffusion models for aerial semantic segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 27695–27705, 2024. 3
work page 2024
-
[49]
Nvae: A deep hierarchical 10 variational autoencoder
Arash Vahdat and Jan Kautz. Nvae: A deep hierarchical 10 variational autoencoder. InAdvances in Neural Information Processing Systems, pages 19667–19679, 2020. 2
work page 2020
-
[50]
Instancediffusion: Instance- level control for image generation
Xudong Wang, Trevor Darrell, Sai Saketh Rambhatla, Ro- hit Girdhar, and Ishan Misra. Instancediffusion: Instance- level control for image generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6232–6242, 2024. 3
work page 2024
-
[51]
Yibo Wang, Ruiyuan Gao, Kai Chen, Kaiqiang Zhou, Yingjie Cai, Lanqing Hong, Zhenguo Li, Lihui Jiang, Dit- Yan Yeung, Qiang Xu, et al. Detdiffusion: Synergizing gen- erative and perceptive models for enhanced data generation and perception. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7246– 7255, 2024. 1
work page 2024
-
[52]
Ke Wu, Jiaxin Chen, and Miao Wang. Domain adaptive ob- ject detection for uav-based images by robust representation learning and multiple pseudo-label aggregation. InProceed- ings of the ACM MM Workshops on Efficient Multimedia Computing under Limited, page 59–67, 2024. 1
work page 2024
-
[53]
Datasetdm: Synthesizing data with perception anno- tations using diffusion models
Weijia Wu, Yuzhong Zhao, Hao Chen, Yuchao Gu, Rui Zhao, Yefei He, Hong Zhou, Mike Zheng Shou, and Chunhua Shen. Datasetdm: Synthesizing data with perception anno- tations using diffusion models. InAdvances in Neural Infor- mation Processing Systems, pages 54683–54695, 2023. 1, 3
work page 2023
-
[54]
Boxdiff: Text-to-image synthesis with training-free box-constrained diffusion
Jinheng Xie, Yuexiang Li, Yawen Huang, Haozhe Liu, Wen- tian Zhang, Yefeng Zheng, and Mike Zheng Shou. Boxdiff: Text-to-image synthesis with training-free box-constrained diffusion. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 7452–7461, 2023. 2
work page 2023
-
[55]
Reco: Region-controlled text-to-image genera- tion
Zhengyuan Yang, Jianfeng Wang, Zhe Gan, Linjie Li, Kevin Lin, Chenfei Wu, Nan Duan, Zicheng Liu, Ce Liu, Michael Zeng, et al. Reco: Region-controlled text-to-image genera- tion. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 14246–14255,
-
[56]
Jinsub Yim, Hyungtae Lee, Sungmin Eum, Yi-Ting Shen, Yan Zhang, Heesung Kwon, and Shuvra S Bhattacharyya. Synplay: Importing real-world diversity for a synthetic hu- man dataset.arXiv e-prints, pages arXiv–2408, 2024. 3
work page 2024
-
[57]
Adding conditional control to text-to-image diffusion models
Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023. 2
work page 2023
-
[58]
Datasetgan: Efficient labeled data factory with minimal human effort
Yuxuan Zhang, Huan Ling, Jun Gao, Kangxue Yin, Jean- Francois Lafleche, Adela Barriuso, Antonio Torralba, and Sanja Fidler. Datasetgan: Efficient labeled data factory with minimal human effort. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10145–10155, 2021. 3
work page 2021
-
[59]
X-paste: Revisiting scalable copy- paste for instance segmentation using clip and stablediffu- sion
Hanqing Zhao, Dianmo Sheng, Jianmin Bao, Dongdong Chen, Dong Chen, Fang Wen, Lu Yuan, Ce Liu, Wenbo Zhou, Qi Chu, et al. X-paste: Revisiting scalable copy- paste for instance segmentation using clip and stablediffu- sion. InProceedings of the International Conference on Ma- chine Learning, pages 42098–42109, 2023. 1, 3
work page 2023
-
[60]
Layoutdiffusion: Controllable diffu- sion model for layout-to-image generation
Guangcong Zheng, Xianpan Zhou, Xuewei Li, Zhongang Qi, Ying Shan, and Xi Li. Layoutdiffusion: Controllable diffu- sion model for layout-to-image generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 22490–22499, 2023. 2
work page 2023
-
[61]
Migc: Multi-instance generation controller for text-to-image synthesis
Dewei Zhou, You Li, Fan Ma, Xiaoting Zhang, and Yi Yang. Migc: Multi-instance generation controller for text-to-image synthesis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6818– 6828, 2024. 2
work page 2024
-
[62]
Odgen: Domain- specific object detection data generation with diffusion mod- els
Jingyuan Zhu, Shiyu Li, Yuxuan Andy Liu, Jian Yuan, Ping Huang, Jiulong Shan, and Huimin Ma. Odgen: Domain- specific object detection data generation with diffusion mod- els. InAdvances in Neural Information Processing Systems, pages 63599–63633, 2024. 1 11 Visual Prototype Conditioned Focal Region Generation for UA V-Based Object Detection Supplementary ...
work page 2024
-
[63]
All experiments were conducted on 8 NVIDIA RTX 3080Ti GPUs
and UA VDT [8]. All experiments were conducted on 8 NVIDIA RTX 3080Ti GPUs. In terms of UA V-based object detection model, following the default experimental set of Remdet [28], we trained the detector on VisDrone dataset for 300 epochs, with a learn- ing rate of 0.01, and applied data augmentation techniques such as mixup and Mosaic. The input image size...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.