LER-YOLO: Reliability-Aware Expert Routing for Misaligned RGB-Infrared UAV Detection
Pith reviewed 2026-05-21 05:46 UTC · model grok-4.3
The pith
A spatial reliability map from target alignment lets sparse MoE fusion suppress unreliable RGB-infrared matches for UAV detection.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that an Uncertainty-Aware Target Alignment module produces a spatial reliability map by resampling visible features toward the infrared reference, and a Reliability-Guided Sparse MoE Fusion module then uses this map to adaptively route to k experts drawn from RGB-dominant, infrared-dominant, and interactive fusion experts, enabling suppression of unreliable fusion while preserving useful information and yielding 89.7 percent average AP50 on the MBU benchmark.
What carries the argument
The Uncertainty-Aware Target Alignment module that generates the spatial reliability map, combined with the Reliability-Guided Sparse MoE Fusion module that uses the map to select and weight experts.
If this is right
- Detection reaches 89.7 percent AP50 with 0.2 percent standard deviation across three independent seeds and a best run of 89.9 percent.
- Gains arise from the reliability-guided routing mechanism rather than from added model capacity.
- Unreliable cross-modal interactions are suppressed while useful information from either modality is retained.
- Performance remains stable under synthetic spatial shifts that simulate varying degrees of misalignment.
Where Pith is reading between the lines
- The same reliability-guided routing could be tested on other misaligned multi-modal tasks such as visible-thermal pedestrian detection or satellite-ground fusion.
- If the reliability map correlates with actual geometric error, the method might reduce the need for hardware-level sensor calibration in field deployments.
- Applying the routing layer to backbones other than YOLOv5s would test whether the benefit is tied to the particular detection architecture.
Load-bearing premise
The spatial reliability map produced by the Uncertainty-Aware Target Alignment module accurately reflects the trustworthiness of local cross-sensor correspondence and can be used to safely suppress unreliable fusion without discarding useful information.
What would settle it
Replace the learned spatial reliability map with a uniform or random map of equal average value and check whether the AP50 gain over a parameter-matched baseline disappears on the MBU benchmark.
Figures
read the original abstract
Detecting small unmanned aerial vehicles from RGB-infrared remote-sensing pairs remains challenging due to tiny target scale, cluttered backgrounds, and spatial misalignment between heterogeneous sensors. Existing bimodal detectors often align or fuse features without assessing the reliability of local cross-sensor correspondence, allowing mismatch artifacts to propagate into the detection head. To address this issue, we propose LER-YOLO, a reliability-aware sparse mixture-of-experts framework for misaligned RGB-infrared UAV detection. LER-YOLO first introduces an Uncertainty-Aware Target Alignment module that resamples visible features toward the infrared reference and estimates a spatial reliability map. This reliability prior is then used by a Reliability-Guided Sparse MoE Fusion module to adaptively select k experts from RGB-dominant, infrared-dominant, and interactive fusion experts, enabling trustworthy cross-modal interaction while suppressing unreliable fusion. Experiments on the public MBU benchmark under a YOLOv5s-family protocol show that LER-YOLO achieves 89.7+/-0.2% AP50 over three independent seeds, with a best result of 89.9%. Extensive ablations, parameter-matched comparisons, synthetic-shift evaluations, and complexity analysis demonstrate that the gains mainly come from reliability-guided expert routing rather than increased model capacity.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes LER-YOLO, a reliability-aware sparse mixture-of-experts framework for detecting small UAVs from spatially misaligned RGB-infrared remote-sensing pairs. It introduces an Uncertainty-Aware Target Alignment module that resamples RGB features to the IR reference while producing a spatial reliability map, which then guides a Reliability-Guided Sparse MoE Fusion module to select k experts (RGB-dominant, IR-dominant, and interactive) for trustworthy cross-modal interaction. On the public MBU benchmark under a YOLOv5s-family protocol, LER-YOLO reports 89.7 ± 0.2% AP50 (best run 89.9%) over three seeds; ablations, parameter-matched baselines, synthetic-shift tests, and complexity analysis are used to attribute gains primarily to the reliability-guided routing rather than added capacity.
Significance. If the reliability map is shown to be accurate, the approach provides a concrete mechanism for suppressing mismatch artifacts in bimodal UAV detection without discarding useful cross-modal information. The parameter-matched comparisons and synthetic-shift evaluations strengthen the case that the routing mechanism, rather than model size, drives the reported AP50 improvement. Reproducibility via multiple seeds and public benchmark use are positive; the work could influence future multimodal remote-sensing detectors if the map's trustworthiness is directly validated.
major comments (1)
- [Abstract and §4] Abstract and §4 (Experiments): The central claim that gains derive from reliability-guided expert routing (rather than capacity or alignment alone) depends on the spatial reliability map correctly identifying trustworthy local RGB-IR correspondence. No direct quantitative validation of the map is reported—e.g., no precision/recall against ground-truth alignment labels, no correlation analysis with synthetic-shift masks, and no ablation isolating map accuracy from the MoE architecture—leaving open the possibility that end-to-end AP50 improvements arise from resampling or expert selection mechanics irrespective of map trustworthiness.
minor comments (2)
- [§3.2] §3.2: The exact selection criterion for the k experts and the formulation of the reliability prior (e.g., how the map is thresholded or normalized before routing) should be stated with an equation or pseudocode for reproducibility.
- [Table 2 and Figure 4] Table 2 and Figure 4: Include standard deviations for all compared methods (not only LER-YOLO) and clarify whether the synthetic-shift tests use the same misalignment distribution as the MBU test set.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed review. The major comment concerns the absence of direct quantitative validation for the spatial reliability map. We respond point-by-point below, clarifying the evidence already present in the manuscript while acknowledging where additional analysis can be provided.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and §4 (Experiments): The central claim that gains derive from reliability-guided expert routing (rather than capacity or alignment alone) depends on the spatial reliability map correctly identifying trustworthy local RGB-IR correspondence. No direct quantitative validation of the map is reported—e.g., no precision/recall against ground-truth alignment labels, no correlation analysis with synthetic-shift masks, and no ablation isolating map accuracy from the MoE architecture—leaving open the possibility that end-to-end AP50 improvements arise from resampling or expert selection mechanics irrespective of map trustworthiness.
Authors: We agree that direct validation of the reliability map would strengthen the central claim. The MBU benchmark does not provide ground-truth local alignment labels, so precision/recall against such labels cannot be computed without new annotations. However, the synthetic-shift experiments introduce controlled, known misalignment patterns and show that performance gains appear specifically when the reliability map is used to guide expert routing; removing this guidance while retaining the MoE structure and alignment module leads to measurable drops. Parameter-matched baselines further isolate the routing mechanism from capacity increases. We will add a correlation analysis between the reliability maps and the synthetic-shift masks, plus an explicit ablation that disables only the reliability weighting inside the MoE, to the revised §4. This addresses the concern as far as the available data allow. revision: partial
- Direct precision/recall evaluation of the reliability map against ground-truth alignment labels, because the MBU benchmark provides no such per-pixel or per-region alignment annotations.
Circularity Check
No significant circularity; performance measured on external benchmark
full rationale
The paper proposes an architectural change to YOLOv5s (Uncertainty-Aware Target Alignment plus Reliability-Guided Sparse MoE Fusion) and reports AP50 on the public MBU benchmark. The central claim that gains arise from reliability-guided routing rather than capacity is supported by parameter-matched ablations and synthetic-shift tests whose metrics are computed from standard detection evaluation protocols. No equation, module definition, or self-citation reduces the reported 89.7 % AP50 or the routing decisions to a fitted parameter or prior result by construction; the reliability map is an internal estimate whose accuracy is not claimed to be proven by the final detection score itself.
Axiom & Free-Parameter Ledger
free parameters (1)
- k (experts selected per location)
axioms (1)
- domain assumption A spatial reliability map can be estimated from the alignment resampling process that meaningfully indicates cross-sensor trustworthiness.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Uncertainty-Aware Target Alignment module that resamples visible features toward the infrared reference and estimates a spatial reliability map... Reliability-Guided Sparse MoE Fusion module to adaptively select k experts
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
self-supervised U-TA reliability loss Luta = ... Rij ||Fir - eFrgb||1 - lambda log(Rij + eps)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
You Only Look Once: Unified, Real-Time Object Detection
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV , USA, 27–30 June 2016; pp. 779–788
work page 2016
-
[2]
Anti-UAV: A Large Multi-Modal Benchmark for UAV Tracking.arXiv2021, arXiv:2101.08466
Jiang, N.; Wang, K.; Peng, X.; Yu, X.; Wang, Q.; Xing, J.; Li, G.; Zhao, J.; Guo, G.; Han, Z.; et al. Anti-UAV: A Large Multi-Modal Benchmark for UAV Tracking.arXiv2021, arXiv:2101.08466
-
[3]
MBUDet: Misaligned Bimodal UAV Target Detection via Target Offset Label Generation.Inf
Ye, Z.; Hao, H.; Peng, Y.; Tang, W.; Zhang, X.; Han, B.; Zhai, H. MBUDet: Misaligned Bimodal UAV Target Detection via Target Offset Label Generation.Inf. Fusion2026,127, 103756. https://doi.org/10.1016/j.inffus.2025.103756
-
[4]
Dong, Y.; Wu, F.; Zhang, S.; Chen, G.; Hu, Y.; Yano, M.; Sun, J.; Huang, S.; Liu, F.; Dai, Q.; et al. Securing the Skies: A Comprehensive Survey on Anti-UAV Methods, Benchmarking, and Future Directions.arXiv2025, arXiv:2504.11967
-
[5]
Pereira, A.; Warwick, S.; Moutinho, A.; Suleman, A. Infrared and Visible Camera Integration for Detection and Tracking of Small UAVs: Systematic Evaluation.Drones2024,8, 650. https://doi.org/10.3390/drones8110650
-
[6]
Svanstrom, F.; Alonso-Fernandez, F.; Englund, C. Drone Detection and Tracking in Real-Time by Fusion of Different Sensing Modalities.Drones2022,6, 317. https://doi.org/10.3390/drones6110317
-
[7]
Zhao, X.; Zhang, W.; Zhang, H.; Zheng, C.; Ma, J.; Zhang, Z. ITD-YOLOv8: An Infrared Target Detection Model Based on YOLOv8 for Unmanned Aerial Vehicles.Drones2024,8, 161. https://doi.org/10.3390/drones8040161
-
[8]
Zhao, X.; Zhang, W.; Xia, Y.; Zhang, H.; Zheng, C.; Ma, J.; Zhang, Z. G-YOLO: A Lightweight Infrared Aerial Remote Sensing Target Detection Model for UAVs Based on YOLOv8.Drones2024,8, 495. https://doi.org/10.3390/drones8090495
-
[9]
Ding, B.; Zhang, Y.; Ma, S. A Lightweight Real-Time Infrared Object Detection Model Based on YOLOv8 for Unmanned Aerial Vehicles.Drones2024,8, 479. https://doi.org/10.3390/drones8090479
-
[10]
An All-Time Detection Algorithm for UAV Images in Urban Low Altitude.Drones2024,8,
Huang, Y.; Qu, J.; Wang, H.; Yang, J. An All-Time Detection Algorithm for UAV Images in Urban Low Altitude.Drones2024,8,
-
[11]
https://doi.org/10.3390/drones8070332
-
[12]
Wang, Z.; Dang, C.; Zhang, R.; Wang, L.; He, Y.; Wu, R. MDDFA-Net: Multi-Scale Dynamic Feature Extraction from Drone- Acquired Thermal Infrared Imagery.Drones2025,9, 224. https://doi.org/10.3390/drones9030224
-
[13]
Drone-Based Visible-Thermal Object Detection with Transformers and Prompt Tuning
Chen, R.; Li, D.; Gao, Z.; Kuai, Y.; Wang, C. Drone-Based Visible-Thermal Object Detection with Transformers and Prompt Tuning. Drones2024,8, 451. https://doi.org/10.3390/drones8090451
-
[14]
Single-Stage UAV Detection and Classification with YOLOv5: Mosaic Data Augmentation and PANet
Dadboud, F.; Patel, V .; Mehta, V .; Bolic, M.; Mantegh, I. Single-Stage UAV Detection and Classification with YOLOv5: Mosaic Data Augmentation and PANet. InProceedings of the 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Washington, DC, USA, 16–19 November 2021; pp. 1–8
work page 2021
-
[16]
Overview of UAV Target Detection Algorithms Based on Deep Learning
Dai, J.; Wu, L.; Wang, P . Overview of UAV Target Detection Algorithms Based on Deep Learning. InProceedings of the 2021 IEEE 2nd International Conference on Information Technology, Big Data and Artificial Intelligence (ICIBA), Chongqing, China, 17–19 December 2021; pp. 736–745
work page 2021
-
[17]
A Real-Time and Lightweight Method for Tiny Airborne Object Detection
Lyu, Y.; Liu, Z.; Li, H.; Guo, D.; Fu, Y. A Real-Time and Lightweight Method for Tiny Airborne Object Detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Vancouver, BC, Canada, 18–22 June 2023; pp. 3016–3025
work page 2023
-
[18]
Investigation of UAV Detection in Images with Complex Backgrounds and Rainy Artifacts
Munir, A.; Siddiqui, A.J.; Anwar, S. Investigation of UAV Detection in Images with Complex Backgrounds and Rainy Artifacts. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW), Waikoloa, HI, USA, 3–8 January 2024; pp. 232–241
work page 2024
-
[19]
Enhanced Thermal-RGB Fusion for Robust Object Detection
El Ahmar, W.; Massoud, Y.; Kolhatkar, D.; AlGhamdi, H.; Alja’Afreh, M.; Laganiere, R.; Hammoud, R. Enhanced Thermal-RGB Fusion for Robust Object Detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Vancouver, BC, Canada, 18–22 June 2023; pp. 365–374
work page 2023
-
[20]
Cross-Modal Transformers for Infrared and Visible Image Fusion.IEEE Trans
Park, S.; Vien, A.G.; Lee, C. Cross-Modal Transformers for Infrared and Visible Image Fusion.IEEE Trans. Circuits Syst. Video Technol.2024,34, 770–785
work page 2024
-
[21]
Cross-Modal and Cross-Level Attention Interaction Network for Salient Object Detection.IEEE Trans
Wang, F.; Su, Y.; Wang, R.; Sun, J.; Sun, F.; Li, H. Cross-Modal and Cross-Level Attention Interaction Network for Salient Object Detection.IEEE Trans. Artif. Intell.2024,5, 2907–2920
work page 2024
-
[22]
Task-Customized Mixture of Adapters for General Image Fusion
Zhu, P .; Sun, Y.; Cao, B.; Hu, Q. Task-Customized Mixture of Adapters for General Image Fusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024; pp. 7099–7108
work page 2024
-
[23]
Weakly Misalignment-Free Adaptive Feature Alignment for UAVs-Based Multimodal Object Detection
Chen, C.; Qi, J.; Liu, X.; Bin, K.; Fu, R.; Hu, X.; Zhong, P . Weakly Misalignment-Free Adaptive Feature Alignment for UAVs-Based Multimodal Object Detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024; pp. 26826–26835
work page 2024
-
[24]
Liu, Z.; Luo, H.; Wang, Z.; Wei, Y.; Zuo, H.; Zhang, J. Cross-Modal Offset-Guided Dynamic Alignment and Fusion for Weakly Aligned UAV Object Detection.arXiv2025, arXiv:2506.16737
-
[25]
Xiao, Y.; Meng, F.; Wu, Q.; Xu, L.; He, M.; Li, H. GM-DETR: Generalized Multispectral Detection Transformer with Efficient Fusion Encoder for Visible-Infrared Detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 17–21 June 2024; pp. 5541–5549
work page 2024
-
[26]
Deformable Convolutional Networks
Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable Convolutional Networks. InProceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 764–773
work page 2017
-
[27]
Jaderberg, M.; Simonyan, K.; Zisserman, A.; Kavukcuoglu, K. Spatial Transformer Networks. InProceedings of the Advances in Neural Information Processing Systems (NeurIPS), Montreal, QC, Canada, 7–12 December 2015; pp. 2017–2025
work page 2015
-
[28]
He, M.; Wu, Q.; Ngan, K.N.; Jiang, F.; Meng, F.; Xu, L. Misaligned RGB-Infrared Object Detection via Adaptive Dual-Discrepancy Calibration.Remote Sens.2023,15, 4887
work page 2023
-
[29]
Weakly Alignment-Free RGBT Salient Object Detection with Deep Correlation Network.IEEE Trans
Tu, Z.; Li, Z.; Li, C.; Tang, J. Weakly Alignment-Free RGBT Salient Object Detection with Deep Correlation Network.IEEE Trans. Image Process.2022,31, 3752–3764
work page 2022
-
[30]
Song, K.; Wen, H.; Xue, X.; Huang, L.; Ji, Y.; Yan, Y. Modality Registration and Object Search Framework for UAV-Based Unregistered RGB-T Image Salient Object Detection.IEEE Trans. Geosci. Remote Sens.2023,61, 1–15
work page 2023
-
[31]
Long-Term TalkingFace Generation via Motion-Prior Conditional Diffusion Model
Shen, F.; Wang, C.; Gao, J.; Guo, Q.; Dang, J.; Tang, J.; Chua, T.-S. Long-Term TalkingFace Generation via Motion-Prior Conditional Diffusion Model. InProceedings of the Forty-Second International Conference on Machine Learning (ICML), 2025
work page 2025
-
[32]
Shen, F.; Tang, J. ImagPose: A Unified Conditional Framework for Pose-Guided Person Generation.Advances in Neural Information Processing Systems2024,37, 6246–6266
-
[33]
ImagDressing-v1: Customizable Virtual Dressing
Shen, F.; Jiang, X.; He, X.; Ye, H.; Wang, C.; Du, X.; Li, Z.; Tang, J. ImagDressing-v1: Customizable Virtual Dressing. InProceedings of the AAAI Conference on Artificial Intelligence, 2025; Volume 39, Number 7, pp. 6795–6804
work page 2025
-
[34]
Advancing Pose-Guided Image Synthesis with Progressive Conditional Diffusion Models
Shen, F.; Ye, H.; Zhang, J.; Wang, C.; Han, X.; Wei, Y. Advancing Pose-Guided Image Synthesis with Progressive Conditional Diffusion Models. InProceedings of the International Conference on Learning Representations (ICLR), 2024
work page 2024
-
[35]
Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models
Shen, F.; Ye, H.; Liu, S.; Zhang, J.; Wang, C.; Han, X.; Wei, Y. Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models. InProceedings of the AAAI Conference on Artificial Intelligence, 2025; Volume 39, Number 7, pp. 6785–6794
work page 2025
-
[36]
Kendall, A.; Gal, Y. What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? InProceedings of the Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017; pp. 5574–5584
work page 2017
-
[37]
Cho, M.; Cao, Y.; Sun, J.; Zhang, Q.; Pavone, M.; Park, J.J.; Yang, H.; Mao, Z.M. Cocoon: Robust Multi-Modal Perception with Uncertainty-Aware Sensor Fusion.arXiv2024, arXiv:2410.12592
-
[39]
AdaMV-MoE: Adaptive Multi-Task Vision Mixture-of- Experts
Chen, T.; Chen, X.; Du, X.; Rashwan, A.; Yang, F.; Chen, H.; Wang, Z.; Li, Y. AdaMV-MoE: Adaptive Multi-Task Vision Mixture-of- Experts. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–6 October 2023; pp. 17300–17311
work page 2023
-
[40]
Meiraz, O.; Shalev, S.; Weizman, A. YOLO Meets Mixture-of-Experts: Adaptive Expert Routing for Robust Object Detection.arXiv 2025, arXiv:2511.13344
-
[41]
MoE3D: Mixture of Experts Meets Multi-Modal 3D Understanding.arXiv 2025, arXiv:2511.22103
Li, Y.; Hou, Y.; Wei, Y.; Zhu, X.; Ma, Y.; Shao, W.; Guo, Y. MoE3D: Mixture of Experts Meets Multi-Modal 3D Understanding.arXiv 2025, arXiv:2511.22103
-
[42]
AW-MoE: All-Weather Mixture of Experts for Robust Multi-Modal 3D Object Detection
Lin, H.; Huang, X.; Wen, C.; Wang, C. AW-MoE: All-Weather Mixture of Experts for Robust Multi-Modal 3D Object Detection. arXiv2026, arXiv:2603.16261
-
[43]
Zhang, J.; Lei, J.; Xie, W.; Fang, Z.; Li, Y.; Du, Q. SuperYOLO: Super Resolution Assisted Object Detection in Multimodal Remote Sensing Imagery.IEEE Trans. Geosci. Remote Sens.2023,61, 1–15
work page 2023
-
[44]
DEYOLO: Dual-Feature-Enhancement YOLO for Cross-Modality Object Detection
Chen, Y.; Wang, B.; Guo, X.; Zhu, W.; He, J.; Liu, X.; Yuan, J. DEYOLO: Dual-Feature-Enhancement YOLO for Cross-Modality Object Detection. InProceedings of the International Conference on Pattern Recognition, Kolkata, India, 1–5 December 2024; pp. 236–252
work page 2024
-
[45]
Wang, Z.; Liao, X.; Yuan, J.; Yao, Y.; Li, Z. CDC-YOLOFusion: Leveraging Cross-Scale Dynamic Convolution Fusion for Visible-Infrared Object Detection.IEEE Trans. Intell. Veh.2024,10, 2080–2093
work page 2024
-
[46]
Cross-Modality Fusion Transformer for Multispectral Object Detection.arXiv2021, arXiv:2111.00273
Qingyun, F.; Dapeng, H.; Zhaokui, W. Cross-Modality Fusion Transformer for Multispectral Object Detection.arXiv2021, arXiv:2111.00273
-
[47]
Shen, J.; Chen, Y.; Liu, Y.; Zuo, X.; Fan, H.; Yang, W. ICAFusion: Iterative Cross-Attention Guided Feature Fusion for Multispectral Object Detection.Pattern Recognit.2024,145, 109913
work page 2024
-
[48]
Multimodal Object Detection via Probabilistic Ensembling
Chen, Y.T.; Shi, J.; Ye, Z.; Mertz, C.; Ramanan, D.; Kong, S. Multimodal Object Detection via Probabilistic Ensembling. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022; pp. 139–158
work page 2022
-
[49]
FBRT-YOLO: Faster and Better for Real-Time Aerial Image Detection
Xiao, Y.; Xu, T.; Xin, Y.; Li, J. FBRT-YOLO: Faster and Better for Real-Time Aerial Image Detection. InProceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025; pp. 8673–8681
work page 2025
-
[50]
DPDETR: Decoupled Position Detection Transformer for Infrared-Visible Object Detection
Guo, J.; Gao, C.; Liu, F.; Meng, D. DPDETR: Decoupled Position Detection Transformer for Infrared-Visible Object Detection. arXiv2025, arXiv:2408.06123
-
[51]
Qin, H.; Xu, T.; Li, T.; Chen, Z.; Feng, T.; Li, J. MUST: The First Dataset and Unified Framework for Multispectral UAV Single Object Tracking.arXiv2025, arXiv:2503.17699
-
[52]
Caltech Aerial RGB-Thermal Dataset in the Wild
Lee, C.; Anderson, M.; Raganathan, N.; Zuo, X.; Do, K.; Gkioxari, G.; Chung, S.J. Caltech Aerial RGB-Thermal Dataset in the Wild. arXiv2024, arXiv:2403.08997
-
[53]
Bin, K.; Chen, C.; Hu, T.; Qi, J.; Zhong, P . ATR-UMMIM: A Benchmark Dataset for UAV-Based Multimodal Image Registration under Complex Imaging Conditions.arXiv2025, arXiv:2507.20764
-
[54]
Chen, C.; Bin, K.; Hu, T.; Qi, J.; Liu, X.; Liu, T.; Liu, Z.; Liu, Y.; Zhong, P . Fusion Meets Diverse Conditions: A High-Diversity Benchmark and Baseline for UAV-Based Multimodal Object Detection with Condition Cues.arXiv2025, arXiv:2510.13620
-
[55]
High-Altitude Infrared Thermal Object Detection for UAVs Based on an Improved RT-DETR
Huang, L.; Li, Y.; Zhang, S. High-Altitude Infrared Thermal Object Detection for UAVs Based on an Improved RT-DETR. In Proceedings of the 2025 International Conference on Computer and Information Processing Technology, 2025; pp. 316–321
work page 2025
-
[56]
Xie, B.; Zhang, C.; Wang, F.; Liu, P .; Lu, F.; Chen, Z.; Hu, W. CST Anti-UAV: A Thermal Infrared Benchmark for Tiny UAV Tracking in Complex Scenes.arXiv2025, arXiv:2507.23473
-
[57]
Deep Learning Based Infrared Small Object Segmentation: Challenges and Future Directions.Inf
Yang, Z.; Yu, H.; Zhang, J.; Tang, Q.; Mian, A. Deep Learning Based Infrared Small Object Segmentation: Challenges and Future Directions.Inf. Fusion2025,118, 103007
-
[58]
Visible-Thermal Tiny Object Detection: A Benchmark Dataset and Baselines.IEEE Trans
Ying, X.; Xiao, C.; An, W.; Li, R.; He, X.; Li, B.; Cao, X.; Li, Z.; Wang, Y.; Hu, M.; et al. Visible-Thermal Tiny Object Detection: A Benchmark Dataset and Baselines.IEEE Trans. Pattern Anal. Mach. Intell.2025,47, 6088–6096
work page 2025
-
[59]
Li, S.; Liu, Z.; Hong, Z.; Zhou, Z.; Cao, X. DEPFusion: Dual-Domain Enhancement and Priority-Guided Mamba Fusion for UAV Multispectral Object Detection.arXiv2025, arXiv:2509.07327
-
[60]
Liu, C.; Ma, X.; Yang, X.; Zhang, Y.; Dong, Y. COMO: Cross-Mamba Interaction and Offset-Guided Fusion for Multimodal Object Detection.arXiv2024, arXiv:2412.18076
-
[61]
Cf-Yolo: Cross-Modal Fusion for Weakly Aligned RGB-IR UAV Object Detection
Nguyen, T.L.; Tran, C.T.; Nguyen Thi, H.V . Cf-Yolo: Cross-Modal Fusion for Weakly Aligned RGB-IR UAV Object Detection. In Proceedings of the 2025 International Symposium on Communications and Information Technologies, 2025; pp. 254–259
work page 2025
-
[62]
Zuo, X.; Qu, C.; Zhan, H.; Shen, J.; Yang, W. SFFR: Spatial-Frequency Feature Reconstruction for Multispectral Aerial Object Detection.arXiv2025, arXiv:2511.06298
-
[63]
A Deep Learning Framework for Infrared and Visible Image Fusion without Strict Registration
Li, H.; Liu, J.; Zhang, Y.; Liu, Y. A Deep Learning Framework for Infrared and Visible Image Fusion without Strict Registration. Int. J. Comput. Vis.2024,132, 1625–1644
work page 2024
-
[65]
Semantics Lead All: Towards Unified Image Registration and Fusion from a Semantic Perspective.Inf
Xie, H.; Zhang, Y.; Qiu, J.; Zhai, X.; Liu, X.; Yang, Y.; Zhao, S.; Luo, Y.; Zhong, J. Semantics Lead All: Towards Unified Image Registration and Fusion from a Semantic Perspective.Inf. Fusion2023,98, 101835
-
[66]
C2Former: Calibrated and Complementary Transformer for RGB-Infrared Object Detection.IEEE Trans
Yuan, M.; Wei, X.; Xingxing. C2Former: Calibrated and Complementary Transformer for RGB-Infrared Object Detection.IEEE Trans. Geosci. Remote Sens.2024,62, 1–12
work page 2024
-
[67]
Improving RGB-Infrared Object Detection with Cascade Alignment-Guided Transformer.Inf
Yuan, M.; Shi, X.; Wang, N.; Wang, Y.; Wei, X. Improving RGB-Infrared Object Detection with Cascade Alignment-Guided Transformer.Inf. Fusion2024,105, 102246
-
[68]
Translation, Scale and Rotation: Cross-Modal Alignment Meets RGB-Infrared Vehicle Detection
Yuan, M.; Wang, Y.; Wei, X. Translation, Scale and Rotation: Cross-Modal Alignment Meets RGB-Infrared Vehicle Detection. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022; pp. 509–525
work page 2022
-
[69]
CornerNet: Detecting Objects as Paired Keypoints
Law, H.; Deng, J. CornerNet: Detecting Objects as Paired Keypoints. InProceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 734–750
work page 2018
-
[70]
CenterNet: Keypoint Triplets for Object Detection
Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. CenterNet: Keypoint Triplets for Object Detection. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6569–6578
work page 2019
-
[71]
Template Matching Advances and Applications in Image Analysis
Hashemi, N.S.; Aghdam, R.B.; Ghiasi, A.S.B.; Fatemi, P . Template Matching Advances and Applications in Image Analysis.arXiv 2016, arXiv:1610.07231
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[72]
Gross, M.; Matha, S.B.; Song, R.; Muthuveerappan, V .; Christoph, C.; Huber, J.; Cremers, D. SegFly: A 2D-3D-2D Paradigm for Aerial RGB-Thermal Semantic Segmentation at Scale.arXiv2026, arXiv:2603.17920. Disclaimer/Publisher’s Note:The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.