arxiv: 2604.19999 · v1 · submitted 2026-04-21 · 💻 cs.CV

Recognition: unknown

Optimizing Data Augmentation for Real-Time Small UAV Detection: A Lightweight Context-Aware Approach

Amir Zamani (Comprehensive University of the Islamic Revolution) , Zeinab Abedini (Sharif University of Technology)

Authors on Pith no claims yet

Pith reviewed 2026-05-10 02:24 UTC · model grok-4.3

classification 💻 cs.CV

keywords data augmentationUAV detectionsmall object detectionreal-time detectioncontext-aware augmentationfoggy conditionsYOLO modelsedge computing

0 comments

The pith

A lightweight context-aware pipeline of Mosaic and HSV adaptations improves mean Average Precision for small UAV detection on edge devices while avoiding artifacts and maintaining stability in fog.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes combining Mosaic strategies with HSV color-space adaptation as a data augmentation pipeline to help lightweight models such as YOLOv11 Nano detect small UAVs more effectively in surveillance settings. This method is tested against heavier alternatives like Copy-Paste on four standard datasets and shown to raise mAP without creating synthetic artifacts or causing overfitting. The same pipeline also delivers a better precision-stability trade-off under foggy conditions than methods such as MixUp, making it practical for real-time deployment on devices with limited capacity.

Core claim

The context-aware data augmentation pipeline that integrates Mosaic strategies and HSV color-space adaptation enhances the performance of lightweight models for small UAV detection, delivering higher mAP across four datasets while preventing synthetic artifacts and overfitting; it further provides the best balance of precision and stability under foggy conditions compared with instance-level methods like Copy-Paste or MixUp.

What carries the argument

The context-aware data augmentation pipeline that combines Mosaic strategies and HSV color-space adaptation

If this is right

Higher mean Average Precision is achieved on four standard UAV detection datasets without introducing synthetic artifacts.
Overfitting is reduced relative to instance-level methods such as Copy-Paste.
Precision and stability remain balanced under foggy conditions where alternative augmentations lose effectiveness.
The pipeline fits real-time constraints on edge hardware with limited learning capacity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same lightweight augmentation pattern could be tested on other small-object classes such as birds or insects to check transferability.
Extending the pipeline to additional environmental degradations like rain or low light would clarify its robustness boundaries.
Combining the approach with other lightweight detectors beyond YOLOv11 Nano might produce comparable gains in surveillance tasks.

Load-bearing premise

The measured mAP gains and foggy-condition stability are caused by the specific Mosaic-plus-HSV pipeline rather than by dataset properties, model hyperparameters, or other unstated training choices.

What would settle it

Running the same experiments on a new fifth dataset or under a different weather condition and finding that mAP or stability no longer exceeds the Copy-Paste and MixUp baselines would falsify the claim.

read the original abstract

Visual detection of Unmanned Aerial Vehicles (UAVs) is a critical task in surveillance systems due to their small physical size and environmental challenges. Although deep learning models have achieved significant progress, deploying them on edge devices necessitates the use of lightweight models, such as YOLOv11 Nano, which possess limited learning capacity. In this research, an efficient and context-aware data augmentation pipeline, combining Mosaic strategies and HSV color-space adaptation, is proposed to enhance the performance of these models. Experimental results on four standard datasets demonstrate that the proposed approach, compared to heavy and instance-level methods like Copy-Paste, not only prevents the generation of synthetic artifacts and overfitting but also significantly improves mean Average Precision (mAP) across all scenarios. Furthermore, the evaluation of generalization capability under foggy conditions revealed that the proposed method offers the optimal balance between Precision and stability for real-time systems, whereas alternative methods, such as MixUp, are effective only in specific applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Applies Mosaic plus HSV augmentations to small UAV detection on YOLOv11 Nano and claims mAP lifts plus fog robustness over Copy-Paste and MixUp, but the causal link stays unproven without ablations or matched baselines.

read the letter

The paper combines Mosaic augmentation with HSV color adjustments in a context-aware pipeline aimed at lightweight YOLOv11 Nano models for small UAV detection. It tests the method on four datasets, reports mAP gains over Copy-Paste and MixUp, and adds foggy-condition trials that show a better precision-stability trade-off than MixUp for real-time use. The authors note that heavier instance-level methods can create artifacts and overfitting, which their lighter approach avoids on capacity-limited models. That focus on edge constraints and outdoor generalization is the practical part worth noting. The work stays within established augmentation techniques rather than introducing new mechanisms or theory. The experiments cover multiple datasets and a relevant degradation case, which gives the results some engineering relevance for surveillance tasks. The soft spots are clear. The abstract supplies no mAP numbers, no error bars, no dataset sizes, and no ablation tables that isolate Mosaic alone, HSV alone, or the full combination against identical training setups. Without those controls it is hard to attribute the gains specifically to the proposed pipeline instead of optimizer choices, epoch counts, or baseline re-implementations. The stress-test concern about non-matched protocols holds up on the available description. If the full paper contains those breakdowns and quantitative details the claims become testable; otherwise the central attribution remains weak. This is the sort of incremental application paper that might interest engineers building drone-monitoring systems who need simple, low-compute tweaks rather than new architectures. A reader already working with YOLO on edge hardware could pick up the foggy-condition evaluation as a useful check. I would send it to peer review because the topic has direct application value and the experiments, once properly documented, could be checked and used even if the novelty is limited.

Referee Report

2 major / 2 minor

Summary. The paper proposes a context-aware data augmentation pipeline that combines Mosaic strategies with HSV color-space adaptation to improve small UAV detection performance using the lightweight YOLOv11 Nano model. It claims this approach outperforms instance-level methods such as Copy-Paste and MixUp on four standard datasets by increasing mAP, avoiding synthetic artifacts and overfitting, and providing superior precision-stability balance under foggy conditions for real-time edge deployment.

Significance. If the empirical gains can be rigorously attributed to the proposed pipeline rather than implementation differences, the work would offer a practical, lightweight augmentation strategy for resource-constrained UAV surveillance systems facing data scarcity and environmental degradation.

major comments (2)

[Experimental results / evaluation on four datasets] The central claim that the context-aware Mosaic+HSV pipeline specifically drives mAP gains and foggy-condition stability (as opposed to dataset tuning or unstated training differences) requires component ablations and matched-protocol baselines. The experimental section reports improvements over Copy-Paste and MixUp but does not isolate Mosaic alone versus Mosaic+HSV, nor provide error bars, exact mAP values, or confirmation that all methods used identical epoch counts, optimizers, and data splits.
[Abstract and experimental claims] The assertion that the method 'prevents the generation of synthetic artifacts and overfitting' is presented as a qualitative advantage but lacks quantitative support such as training/validation loss curves, overfitting metrics, or visual artifact counts compared to the baselines.

minor comments (2)

[Abstract] The abstract states 'significantly improves mean Average Precision (mAP)' without any numerical values; moving concrete mAP deltas and dataset identifiers into the abstract would improve readability.
[Experimental setup] Dataset details (image counts, UAV sizes, annotation protocols) and exact YOLOv11 Nano training hyperparameters are referenced but not tabulated; a summary table would aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which have helped us strengthen the experimental rigor of the manuscript. We have revised the paper to include the requested component ablations, matched-protocol details, error bars, exact values, and quantitative metrics supporting the claims on artifacts and overfitting.

read point-by-point responses

Referee: [Experimental results / evaluation on four datasets] The central claim that the context-aware Mosaic+HSV pipeline specifically drives mAP gains and foggy-condition stability (as opposed to dataset tuning or unstated training differences) requires component ablations and matched-protocol baselines. The experimental section reports improvements over Copy-Paste and MixUp but does not isolate Mosaic alone versus Mosaic+HSV, nor provide error bars, exact mAP values, or confirmation that all methods used identical epoch counts, optimizers, and data splits.

Authors: We agree that isolating the individual contributions of the Mosaic and HSV components is necessary to attribute the gains specifically to the combined pipeline. All experiments in the original work used identical training protocols across the proposed method and baselines: 300 epochs, the Adam optimizer with the same learning rate schedule and hyperparameters, and identical train/validation/test splits on each of the four datasets. To address the concern directly, the revised manuscript now includes a new ablation subsection with results for Mosaic alone, HSV adaptation alone, and the full Mosaic+HSV combination. We report mean mAP values with standard deviations computed over three independent runs (providing error bars) and tabulate the exact mAP figures for all comparisons. These additions confirm the performance improvements are driven by the context-aware pipeline rather than protocol variations. revision: yes
Referee: [Abstract and experimental claims] The assertion that the method 'prevents the generation of synthetic artifacts and overfitting' is presented as a qualitative advantage but lacks quantitative support such as training/validation loss curves, overfitting metrics, or visual artifact counts compared to the baselines.

Authors: The original statement was based on observed training dynamics and visual examination of augmented samples, where instance-level methods introduced visible inconsistencies around small UAVs. We acknowledge that this lacked quantitative backing. In the revised manuscript we have added training and validation loss curves for the proposed approach versus Copy-Paste and MixUp, which show reduced divergence between training and validation loss and thus lower overfitting. We further introduce a quantitative artifact metric defined as the fraction of augmented images containing bounding-box annotation inconsistencies (measured via overlap and size variance checks), and report lower values for the context-aware pipeline. These elements provide the requested quantitative support while preserving the original qualitative observations. revision: yes

Circularity Check

0 steps flagged

No derivation chain present; empirical comparisons are externally falsifiable

full rationale

The manuscript is an applied computer-vision paper whose central claims rest on experimental mAP measurements across four datasets and foggy-condition tests. No equations, first-principles derivations, fitted parameters renamed as predictions, or uniqueness theorems appear in the provided text or abstract. The augmentation pipeline (Mosaic + HSV) is described procedurally and evaluated by direct comparison to Copy-Paste and MixUp baselines; these results are statistically independent of the method itself and can be reproduced or refuted by external runs. No self-citations, ansatzes smuggled via prior work, or self-definitional loops are load-bearing. The paper therefore contains no circular step of any enumerated kind.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard computer-vision assumptions that data augmentation improves generalization for small objects and that mAP is a reliable proxy for real-world detection performance; no new entities or ad-hoc axioms are introduced beyond typical ML practice.

axioms (1)

domain assumption Data augmentation via Mosaic and HSV shifts improves model robustness without introducing artifacts that degrade performance
Invoked in the abstract when claiming superiority over Copy-Paste and MixUp

pith-pipeline@v0.9.0 · 5479 in / 1334 out tokens · 96576 ms · 2026-05-10T02:24:50.966779+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 1 canonical work pages · 1 internal anchor

[1]

Unmanned aerial vehicles (UAVs): practical aspects, applications, open challenges, security issues, and future trends,

S. A. H. Mohsan, N. Q. H. Othman, Y. Li, M. H. Alsharif, and M. A. Khan, "Unmanned aerial vehicles (UAVs): practical aspects, applications, open challenges, security issues, and future trends," Intelligent Service Robotics, vol. 16, no. 1, pp. 109-137, 2023

2023
[2]

Unmanned aerial vehicle for internet of everything: Opportunities and challenges,

Y. Liu, H.-N. Dai, Q. Wang, M. K. Shukla, and M. Imran, "Unmanned aerial vehicle for internet of everything: Opportunities and challenges," Computer Communications, vol. 155, pp. 66–83, 2020

2020
[3]

A comprehensive review of unmanned aerial vehicle attacks and neutralization techniques,

V. Chamola, P. Kotesh, A. Agarwal, N. Gupta, and M. Guizani, "A comprehensive review of unmanned aerial vehicle attacks and neutralization techniques," Ad Hoc Networks , vol. 111, p. 102324, 2021

2021
[4]

Air -to-Air Visual Detection of Micro -UAVs: An Experimental Evaluation of Deep Learning,

Y. Zheng et al., "Air -to-Air Visual Detection of Micro -UAVs: An Experimental Evaluation of Deep Learning," IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 1020-1027, 2021

2021
[5]

The Drone -vs-Bird Detection Grand Challenge at ICASSP 2023: A Review of Methods and Results,

A. Coluccia et al., "The Drone -vs-Bird Detection Grand Challenge at ICASSP 2023: A Review of Methods and Results," IEEE Open Journal of Signal Processing, vol. 5, pp. 766-779, 2024

2023
[6]

You only look once: Unified, real-time object detection,

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2016, pp. 779-788

2016
[7]

Faster R-CNN: Towards real- time object detection with region proposal networks,

S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards real- time object detection with region proposal networks," in Advances in Neural Information Processing Systems (NeurIPS), 2015, vol. 28

2015
[8]

Vision-based Anti-UAV Detection and Tracking,

J. Zhao et al., "Vision-based Anti-UAV Detection and Tracking," IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 12, pp. 25323-25334, 2022

2022
[9]

Edge Computing -Driven Real -Time Drone Detection Using YOLOv9 and NVIDIA Jetson Nano,

M. Al -Rakhami et al., "Edge Computing -Driven Real -Time Drone Detection Using YOLOv9 and NVIDIA Jetson Nano," Drones, vol. 8, no. 11, p. 680, 2024

2024
[10]

Small Object Detection for UAVs Using Deep Learning Models on Edge Computing: A Comparative Analysis,

M. S. S. Vasishta et al., "Small Object Detection for UAVs Using Deep Learning Models on Edge Computing: A Comparative Analysis," in 5th International Conference on Circuits, Control, Communication and Computing (I4C), 2024, pp. 106-112

2024
[11]

YOLOv11 Documentation,

Ultralytics, "YOLOv11 Documentation," 2024. [Online]. Available: https://docs.ultralytics.com/models/yolov11

2024
[12]

Improving Small Drone Detection Through Multi -Scale Processing and Data Augmentation,

R. Laroca, M. dos Santos, and D. Menotti, "Improving Small Drone Detection Through Multi -Scale Processing and Data Augmentation," in International Joint Conference on Neural Networks (IJCNN), 2025, pp. 1-8

2025
[13]

Foggy Drone Teacher: Domain Adaptive Drone Detection Under Foggy Conditions,

G. Zheng, B. Tan, J. Wu, X. Qin, Y. Li, and S. Ding, "Foggy Drone Teacher: Domain Adaptive Drone Detection Under Foggy Conditions," Drones, vol. 9, no. 2, p. 146, 2025

2025
[14]

YOLOv4: Optimal Speed and Accuracy of Object Detection

A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, "YOLOv4: Optimal Speed and Accuracy of Object Detection," arXiv preprint arXiv:2004.10934, 2020

work page internal anchor Pith review arXiv 2004
[15]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770-778

2016
[16]

YOLOv5 Documentation,

Ultralytics, "YOLOv5 Documentation," 2020. [Online]. Available: https://docs.ultralytics.com/models/yolov5

2020
[17]

mixup: Beyond Empirical Risk Minimization,

H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez -Paz, "mixup: Beyond Empirical Risk Minimization," in International Conference on Learning Representations (ICLR), 2018

2018
[18]

YOLOv8 Documentation,

Ultralytics, "YOLOv8 Documentation," 2023. [Online]. Available: https://docs.ultralytics.com/models/yolov8

2023
[19]

Microsoft COCO: Common Objects in Context,

T.-Y. Lin et al., "Microsoft COCO: Common Objects in Context," in European Conference on Computer Vision (ECCV) , 2014, pp. 740 - 755

2014