Recognition: unknown
WD-FQDet: Multispectral Detection Transformer via Wavelet Decomposition and Frequency-aware Query Learning
Pith reviewed 2026-05-14 19:11 UTC · model grok-4.3
The pith
Wavelet decomposition decouples shared low-frequency and specific high-frequency features from infrared and visible images to improve object detection.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
WD-FQDet explicitly decouples modality-shared and modality-specific information from infrared and visible modalities in the new view of low- and high-frequency domains via wavelet decomposition. A low-frequency homogeneity alignment module aligns shared features across modalities via cross-modal attention, a high-frequency specificity retention module preserves modality-specific features through multi-scale gradient consistency loss, a hybrid feature enhancement module incorporates spatial cues, and a frequency-aware query selection module dynamically regulates their contributions, yielding state-of-the-art performance on the FLIR, LLVIP, and M3FD datasets.
What carries the argument
Wavelet decomposition that splits inputs into low-frequency modality-shared and high-frequency modality-specific domains, paired with alignment, retention, and frequency-aware query selection modules.
If this is right
- Shared low-frequency features can be aligned across modalities to reduce bias without losing complementary details.
- Modality-specific high-frequency features are retained via gradient consistency to address insufficiency in fusion.
- Dynamic query selection adapts the weight of homogeneous versus specific features to different detection scenarios.
- State-of-the-art results appear across multiple metrics on the FLIR, LLVIP, and M3FD benchmarks.
Where Pith is reading between the lines
- The frequency split could be tested on additional modality pairs such as RGB and depth to check whether low-frequency alignment generalizes beyond infrared-visible cases.
- If the separation works reliably, it may reduce reliance on separate backbone designs for each modality in future multispectral detectors.
- Efficiency measurements on embedded hardware would show whether the added wavelet and query modules support real-time use.
Load-bearing premise
Wavelet decomposition cleanly separates modality-shared low-frequency features from modality-specific high-frequency features without introducing artifacts or bias that the alignment and retention modules cannot correct.
What would settle it
Retraining and testing the model on the same datasets after removing the wavelet decomposition step; if accuracy falls to levels matching or below standard fusion baselines, the frequency-decoupling premise does not hold.
Figures
read the original abstract
Infrared-visible object detection improves detection performance by combining complementary features from multispectral images. Existing backbone-specific and backbone-shared approaches still suffer from the problems of severe bias of modality-shared features and the insufficiency of modality-specific features. To address these issues, we propose a novel detection framework WD-FQDet that explicitly decouples modality-shared and modality-specific information from infrared and visible modalities in the new view of low- and high-frequency domains, allowing fusion strategies tailored to their frequency characteristics. Specifically, a low-frequency homogeneity alignment module is proposed to align modality-shared features across modalities via a cross-modal attention mechanism, and a high-frequency specificity retention module is proposed to preserve modality-specific features through the multi-scale gradient consistency loss. To reinforce the feature representation in the frequency domain, we propose a hybrid feature enhancement module that incorporates spatial cues. Furthermore, considering that the contributions of homogeneous and modality-specific features to object detection vary across scenarios, we propose a frequency-aware query selection module to dynamically regulate their contributions. Experimental results on the FLIR, LLVIP, and M3FD datasets demonstrate that WD-FQDet achieves state-of-the-art performance across multiple evaluation metrics.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents WD-FQDet, a multispectral detection transformer for infrared-visible object detection. It claims to explicitly decouple modality-shared and modality-specific information via wavelet decomposition into low- and high-frequency domains, enabling tailored fusion through a low-frequency homogeneity alignment module (cross-modal attention), a high-frequency specificity retention module (multi-scale gradient consistency loss), a hybrid feature enhancement module, and a frequency-aware query selection module. The work reports state-of-the-art performance across multiple metrics on the FLIR, LLVIP, and M3FD datasets.
Significance. If the frequency-domain decoupling proves effective and the modules mitigate bias and insufficiency without introducing artifacts, the framework could offer a principled advance over backbone-specific or shared fusion methods by exploiting complementary frequency characteristics in multispectral imagery. The dynamic query selection and gradient consistency loss represent potentially useful mechanisms for scenario-adaptive fusion.
major comments (2)
- [Method (wavelet decomposition and frequency modules)] The central claim in the abstract and method description rests on the assumption that wavelet decomposition (presumably DWT) maps modality-shared information predominantly to low-frequency subbands and modality-specific information to high-frequency subbands. No quantitative validation—such as cross-modal mutual information, cosine similarity, or correlation metrics computed per subband on FLIR/LLVIP/M3FD—is reported to confirm the separation is sufficiently clean; low-frequency components can encode modality-specific biases (thermal gradients vs. illumination) while high-frequency edges may be shared, undermining the subsequent alignment and retention modules.
- [Experiments] The experimental claims of SOTA performance lack supporting details: no full baseline tables, module-wise ablations, statistical significance tests, or error analysis (e.g., failure cases under varying illumination) are described, making it impossible to assess whether the reported gains are attributable to the frequency-aware components or to implementation specifics.
minor comments (1)
- [Method] Clarify the exact wavelet basis and decomposition levels used, and provide equations for the multi-scale gradient consistency loss and frequency-aware query selection to improve reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed review of our manuscript on WD-FQDet. We have carefully considered each major comment and provide point-by-point responses below. We agree that additional validation and experimental details will strengthen the paper and will incorporate the suggested changes in the revised version.
read point-by-point responses
-
Referee: [Method (wavelet decomposition and frequency modules)] The central claim in the abstract and method description rests on the assumption that wavelet decomposition (presumably DWT) maps modality-shared information predominantly to low-frequency subbands and modality-specific information to high-frequency subbands. No quantitative validation—such as cross-modal mutual information, cosine similarity, or correlation metrics computed per subband on FLIR/LLVIP/M3FD—is reported to confirm the separation is sufficiently clean; low-frequency components can encode modality-specific biases (thermal gradients vs. illumination) while high-frequency edges may be shared, undermining the subsequent alignment and retention modules.
Authors: We appreciate this observation regarding the need for explicit validation of the frequency-domain separation. The design of WD-FQDet is motivated by the established frequency separation properties of discrete wavelet transform (DWT), where low-frequency subbands typically encode shared structural information and high-frequency subbands capture modality-specific details. However, we acknowledge that direct quantitative metrics were not reported in the original submission. In the revised manuscript, we will add cross-modal mutual information, cosine similarity, and correlation analyses computed per subband on the FLIR, LLVIP, and M3FD datasets to empirically confirm the decoupling quality and address potential concerns about modality-specific biases in low-frequency components. revision: yes
-
Referee: [Experiments] The experimental claims of SOTA performance lack supporting details: no full baseline tables, module-wise ablations, statistical significance tests, or error analysis (e.g., failure cases under varying illumination) are described, making it impossible to assess whether the reported gains are attributable to the frequency-aware components or to implementation specifics.
Authors: We agree that expanded experimental details are required for full transparency. While the original manuscript presents SOTA results and some ablation studies, we will revise the experimental section to include complete baseline comparison tables, comprehensive module-wise ablations, statistical significance testing (e.g., paired t-tests across multiple runs), and a dedicated error analysis subsection examining failure cases under varying illumination and other conditions. These additions will better isolate the contributions of the frequency-aware modules. revision: yes
Circularity Check
No significant circularity in WD-FQDet derivation chain
full rationale
The paper introduces an architectural framework using wavelet decomposition to separate low- and high-frequency components, followed by explicitly defined modules (low-frequency homogeneity alignment via cross-modal attention, high-frequency specificity retention via gradient consistency loss, hybrid enhancement, and frequency-aware query selection). These are presented as design choices motivated by the frequency-domain view rather than derived from or reducing to the final outputs. No equations or claims reduce predictions to fitted parameters by construction, and no load-bearing self-citations or uniqueness theorems are invoked. Performance is assessed via standard empirical evaluation on external datasets (FLIR, LLVIP, M3FD), keeping the central claims independent of the inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
V Bhavana and HK Krishnappa. Multi-modality medical im- age fusion using discrete wavelet transform.Procedia Com- puter Science, 70:625–631, 2015. 3
work page 2015
-
[2]
Multimodal object detection by channel switching and spatial attention
Yue Cao, Junchi Bin, Jozsef Hamari, Erik Blasch, and Zheng Liu. Multimodal object detection by channel switching and spatial attention. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition, pages 403–411, 2023. 6, 7
work page 2023
-
[3]
End-to- end object detection with transformers
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to- end object detection with transformers. InEuropean confer- ence on computer vision, pages 213–229. Springer, 2020. 6, 7
work page 2020
- [4]
-
[5]
Multimodal object detection via probabilistic ensembling
Yi-Ting Chen, Jinghao Shi, Zelin Ye, Christoph Mertz, Deva Ramanan, and Shu Kong. Multimodal object detection via probabilistic ensembling. InEuropean Conference on Com- puter Vision, pages 139–158. Springer, 2022. 7
work page 2022
-
[6]
Shuxiao Cheng, Yishuang Zhu, and Shaohua Wu. Deep learning based efficient ship detection from drone-captured images for maritime surveillance.Ocean engineering, 285: 115440, 2023. 1
work page 2023
-
[7]
Xception: Deep learning with depthwise separable convolutions
Franc ¸ois Chollet. Xception: Deep learning with depthwise separable convolutions. InProceedings of the IEEE con- ference on computer vision and pattern recognition, pages 1251–1258, 2017. 4
work page 2017
-
[8]
Marcelo Contreras, Aayush Jain, Neel P Bhatt, Arunava Banerjee, and Ehsan Hashemi. A survey on 3d object de- tection in real time for autonomous driving.Frontiers in Robotics and AI, 11:1212070, 2024. 1
work page 2024
-
[9]
Histograms of oriented gra- dients for human detection
Navneet Dalal and Bill Triggs. Histograms of oriented gra- dients for human detection. In2005 IEEE computer soci- ety conference on computer vision and pattern recognition (CVPR’05), pages 886–893. Ieee, 2005. 4
work page 2005
-
[10]
Pedestrian detection by fusion of rgb and infrared im- ages in low-light environment
Qing Deng, Wei Tian, Yuyao Huang, Lu Xiong, and Xin Bi. Pedestrian detection by fusion of rgb and infrared im- ages in low-light environment. In2021 IEEE 24th Interna- tional Conference on Information Fusion (FUSION), pages 1–8. IEEE, 2021. 2
work page 2021
-
[11]
Fusion-mamba for cross-modality object detection.arXiv preprint arXiv:2404.09146, 2024
Wenhao Dong, Haodong Zhu, Shaohui Lin, Xiaoyan Luo, Yunhang Shen, Xuhui Liu, Juan Zhang, Guodong Guo, and Baochang Zhang. Fusion-mamba for cross-modality object detection.arXiv preprint arXiv:2404.09146, 2024. 1, 7
-
[12]
Marie Farge et al. Wavelet transforms and their applications to turbulence.Annual review of fluid mechanics, 24(1):395– 458, 1992. 2, 4
work page 1992
-
[13]
Wavelet convolutions for large receptive fields
Shahaf E Finder, Roy Amoyal, Eran Treister, and Oren Freifeld. Wavelet convolutions for large receptive fields. In European Conference on Computer Vision, pages 363–380. Springer, 2024. 3
work page 2024
-
[14]
Flir thermal dataset for algorithm training.https: / / www
FLIR. Flir thermal dataset for algorithm training.https: / / www . flir . in / oem / adas / adas - dataset - form/. 2018. 6
work page 2018
-
[15]
Haolong Fu, Shixun Wang, Puhong Duan, Changyan Xiao, Renwei Dian, Shutao Li, and Zhiyong Li. Lraf-net: Long- range attention fusion network for visible–infrared object de- tection.IEEE Transactions on Neural Networks and Learn- ing Systems, 2023. 1, 6, 7, 8
work page 2023
-
[16]
Machine vision based fire detection techniques: A survey.Fire technology, 57(2):591–623, 2021
S Geetha, CS Abhishek, and CS Akshayanat. Machine vision based fire detection techniques: A survey.Fire technology, 57(2):591–623, 2021. 1
work page 2021
-
[17]
Dayan Guan, Yanpeng Cao, Jiangxin Yang, Yanlong Cao, and Michael Ying Yang. Fusion of multispectral data through illumination-aware deep neural networks for pedestrian de- tection.Information Fusion, 50:148–157, 2019. 2
work page 2019
-
[18]
Junjie Guo, Chenqiang Gao, Fangcen Liu, and Deyu Meng. Dpdetr: Decoupled position detection trans- former for infrared-visible object detection.arXiv preprint arXiv:2408.06123, 2024. 1, 2
-
[19]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 7
work page 2016
-
[20]
Seongmin Hwang, Daeyoung Han, Cheolkon Jung, and Moongu Jeon. Wavedh: Wavelet sub-bands guided convnet for efficient image dehazing.arXiv preprint arXiv:2404.01604, 2024. 2
-
[21]
Llvip: A visible-infrared paired dataset for low-light vision
Xinyu Jia, Chuang Zhu, Minzhen Li, Wenqi Tang, and Wenli Zhou. Llvip: A visible-infrared paired dataset for low-light vision. InProceedings of the IEEE/CVF international con- ference on computer vision, pages 3496–3504, 2021. 6
work page 2021
-
[22]
G. Jocher. Yolov5 by ultralytics.https://github. com/ultralytics/yolov5. 2020. 7
work page 2020
-
[23]
Nick Kanopoulos, Nagesh Vasanthavada, and Robert L Baker. Design of an image edge detection filter using the sobel operator.IEEE Journal of solid-state circuits, 23(2): 358–367, 1988. 5
work page 1988
-
[24]
Fully convo- lutional region proposal networks for multispectral person detection
Daniel Konig, Michael Adam, Christian Jarvers, Georg Lay- her, Heiko Neumann, and Michael Teutsch. Fully convo- lutional region proposal networks for multispectral person detection. InProceedings of the IEEE conference on com- puter vision and pattern recognition workshops, pages 49– 56, 2017. 2
work page 2017
-
[25]
Seungik Lee, Jaehyeong Park, and Jinsun Park. Crossformer: Cross-guided attention for multi-modal object detection.Pat- tern Recognition Letters, 179:144–150, 2024. 8
work page 2024
-
[26]
Chengyang Li, Dan Song, Ruofeng Tong, and Min Tang. Illumination-aware faster r-cnn for robust multispectral pedestrian detection.Pattern Recognition, 85:161–171,
-
[27]
Ke Li, Di Wang, Zhangyuan Hu, Shaofeng Li, Weiping Ni, Lin Zhao, and Quan Wang. Fd2-net: Frequency-driven fea- ture decomposition network for infrared-visible object detec- tion.arXiv preprint arXiv:2412.09258, 2024. 6, 7
-
[28]
Vehicle detection algorithms for autonomous driving: A review.Sensors, 24 (10):3088, 2024
Liang Liang, Haihua Ma, Le Zhao, Xiaopeng Xie, Chengxin Hua, Miao Zhang, and Yonghui Zhang. Vehicle detection algorithms for autonomous driving: A review.Sensors, 24 (10):3088, 2024. 1
work page 2024
-
[29]
Microsoft coco: Common objects in context
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In Computer vision–ECCV 2014: 13th European conference, zurich, Switzerland, September 6-12, 2014, proceedings, part v 13, pages 740–755. Springer, 2014. 6
work page 2014
-
[30]
Jinyuan Liu, Xin Fan, Zhanbo Huang, Guanyao Wu, Risheng Liu, Wei Zhong, and Zhongxuan Luo. Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 5802–5811, 2022. 6, 7
work page 2022
-
[31]
Jinyuan Liu, Zhu Liu, Guanyao Wu, Long Ma, Risheng Liu, Wei Zhong, Zhongxuan Luo, and Xin Fan. Multi- interactive feature learning and a full-time multi-modality benchmark for image fusion and segmentation. InProceed- ings of the IEEE/CVF international conference on computer vision, pages 8115–8124, 2023. 6
work page 2023
-
[32]
Tianshan Liu, Kin-Man Lam, Rui Zhao, and Guoping Qiu. Deep cross-modal representation learning and distillation for illumination-invariant pedestrian detection.IEEE Transac- tions on Circuits and Systems for Video Technology, 32(1): 315–329, 2021. 2
work page 2021
-
[33]
Fang Qingyun, Han Dapeng, and Wang Zhaokui. Cross- modality fusion transformer for multispectral object detec- tion.arXiv preprint arXiv:2111.00273, 2021. 7, 8
-
[34]
Aref Miri Rekavandi, Lian Xu, Farid Boussaid, Abd-Krim Seghouane, Stephen Hoefs, and Mohammed Bennamoun. A guide to image-and video-based small object detection using deep learning: Case study of maritime surveillance.IEEE Transactions on Intelligent Transportation Systems, 2025. 1
work page 2025
-
[35]
Ayesha Shafique, Guo Cao, Zia Khan, Muhammad Asad, and Muhammad Aslam. Deep learning-based change detec- tion in remote sensing images: A review.Remote Sensing, 14(4):871, 2022. 1
work page 2022
-
[36]
Divfusion: Darkness-free infrared and visible im- age fusion.Information Fusion, 91:477–493, 2023
Linfeng Tang, Xinyu Xiang, Hao Zhang, Meiqi Gong, and Jiayi Ma. Divfusion: Darkness-free infrared and visible im- age fusion.Information Fusion, 91:477–493, 2023. 7
work page 2023
-
[37]
Linfeng Tang, Hao Zhang, Han Xu, and Jiayi Ma. Rethink- ing the necessity of image fusion in high-level vision tasks: A practical infrared and visible image fusion network based on progressive semantic injection and scene fidelity.Infor- mation Fusion, 99:101870, 2023. 7
work page 2023
-
[38]
Multi-stage image denoising with the wavelet transform.Pattern Recognition, 134:109050, 2023
Chunwei Tian, Menghua Zheng, Wangmeng Zuo, Bob Zhang, Yanning Zhang, and David Zhang. Multi-stage image denoising with the wavelet transform.Pattern Recognition, 134:109050, 2023. 3
work page 2023
-
[39]
Attention is all you need.Advances in neural information processing systems, 30, 2017
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017. 5
work page 2017
-
[40]
Multispectral pedestrian detection using deep fusion convolutional neural networks
J ¨org Wagner, V olker Fischer, Michael Herman, Sven Behnke, et al. Multispectral pedestrian detection using deep fusion convolutional neural networks. InESANN, pages 509–514, 2016. 2
work page 2016
-
[41]
Eca-net: Efficient channel at- tention for deep convolutional neural networks
Qilong Wang, Banggu Wu, Pengfei Zhu, Peihua Li, Wang- meng Zuo, and Qinghua Hu. Eca-net: Efficient channel at- tention for deep convolutional neural networks. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11534–11542, 2020. 4
work page 2020
-
[42]
Yu Wang, G Rajesh, X Mercilin Raajini, N Kritika, A Kavinkumar, and Syed Bilal Hussain Shah. Machine learning-based ship detection and tracking using satellite im- ages for maritime surveillance.Journal of Ambient Intelli- gence and Smart Environments, 13(5):361–371, 2021. 1
work page 2021
-
[43]
Yiming Xiao, Fanman Meng, Qingbo Wu, Linfeng Xu, Mingzhou He, and Hongliang Li. Gm-detr: Generalized muiltispectral detection transformer with efficient fusion en- coder for visible-infrared detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5541–5549, 2024. 1, 5, 6, 7
work page 2024
-
[44]
Yinghui Xing, Shuo Yang, Song Wang, Shizhou Zhang, Guoqiang Liang, Xiuwei Zhang, and Yanning Zhang. Ms- detr: Multispectral pedestrian detection transformer with loosely coupled fusion and modality-balanced optimization. IEEE Transactions on Intelligent Transportation Systems,
-
[45]
Yunsong Yang, Genji Yuan, and Jinjiang Li. Sffnet: A wavelet-based spatial and frequency domain fusion network for remote sensing segmentation.IEEE Transactions on Geoscience and Remote Sensing, 2024. 3
work page 2024
-
[46]
Multi-Scale Context Aggregation by Dilated Convolutions
Fisher Yu and Vladlen Koltun. Multi-scale context aggregation by dilated convolutions.arXiv preprint arXiv:1511.07122, 2015. 5
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[47]
Maoxun Yuan and Xingxing Wei. C 2 former: Calibrated and complementary transformer for rgb-infrared object de- tection.IEEE Transactions on Geoscience and Remote Sens- ing, 2024. 8
work page 2024
-
[48]
Transla- tion, scale and rotation: cross-modal alignment meets rgb- infrared vehicle detection
Maoxun Yuan, Yinyan Wang, and Xingxing Wei. Transla- tion, scale and rotation: cross-modal alignment meets rgb- infrared vehicle detection. InEuropean Conference on Com- puter Vision, pages 509–525. Springer, 2022. 8
work page 2022
-
[49]
Hao Zhang and Jiayi Ma. Sdnet: A versatile squeeze-and- decomposition network for real-time image fusion.Inter- national Journal of Computer Vision, 129(10):2761–2785,
-
[50]
Dino: Detr with improved denoising anchor boxes for end-to-end object de- tection
Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel Ni, and Heung-Yeung Shum. Dino: Detr with improved denoising anchor boxes for end-to-end object de- tection. InThe Eleventh International Conference on Learn- ing Representations. 7
-
[51]
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M Ni, and Heung-Yeung Shum. Dino: Detr with improved denoising anchor boxes for end-to-end object detection.arXiv preprint arXiv:2203.03605, 2022. 7
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[52]
Ruiheng Zhang, Lu Li, Qi Zhang, Jin Zhang, Lixin Xu, Baomin Zhang, and Binglu Wang. Differential feature awareness network within antagonistic learning for infrared- visible object detection.IEEE Transactions on Circuits and Systems for Video Technology, 2023. 1
work page 2023
-
[53]
Dense distinct query for end-to-end object detection
Shilong Zhang, Xinjiang Wang, Jiaqi Wang, Jiangmiao Pang, Chengqi Lyu, Wenwei Zhang, Ping Luo, and Kai Chen. Dense distinct query for end-to-end object detection. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 7329–7338, 2023. 7
work page 2023
-
[54]
Xingjian Zhang, Shuang Li, Zhenyu Tan, and Xinghua Li. Enhanced wavelet based spatiotemporal fusion networks us- ing cross-paired remote sensing images.ISPRS Journal of Photogrammetry and Remote Sensing, 211:281–297, 2024. 3
work page 2024
-
[55]
Tfdet: Target-aware fusion for rgb-t pedestrian detection
Xue Zhang, Xiaohan Zhang, Jiangtao Wang, Jiacheng Ying, Zehua Sheng, Heng Yu, Chunguang Li, and Hui-Liang Shen. Tfdet: Target-aware fusion for rgb-t pedestrian detection. IEEE Transactions on Neural Networks and Learning Sys- tems, 2024. 2, 6, 7
work page 2024
-
[56]
Ruixin Zhao, Sai Hong Tang, Jiazheng Shen, Eris Elianddy Bin Supeni, and Sharafiz Abdul Rahim. Enhancing autonomous driving safety: a robust traffic sign detection and recognition model tsd-yolo.Signal Processing, 225:109619,
-
[57]
Tianyi Zhao, Maoxun Yuan, Feng Jiang, Nan Wang, and Xingxing Wei. Removal and selection: Improving rgb- infrared object detection via coarse-to-fine fusion.arXiv preprint arXiv:2401.10731, 2024. 7
-
[58]
Metafusion: Infrared and visible image fusion via meta- feature embedding from object detection
Wenda Zhao, Shigeng Xie, Fan Zhao, You He, and Huchuan Lu. Metafusion: Infrared and visible image fusion via meta- feature embedding from object detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13955–13965, 2023. 6
work page 2023
-
[59]
Detrs beat yolos on real-time object detection
Yian Zhao, Wenyu Lv, Shangliang Xu, Jinman Wei, Guanzhong Wang, Qingqing Dang, Yi Liu, and Jie Chen. Detrs beat yolos on real-time object detection. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16965–16974, 2024. 4, 6, 7
work page 2024
-
[60]
Zixiang Zhao, Shuang Xu, Jiangshe Zhang, Chengyang Liang, Chunxia Zhang, and Junmin Liu. Efficient and model- based infrared and visible image fusion via algorithm un- rolling.IEEE Transactions on Circuits and Systems for Video Technology, 32(3):1186–1196, 2021. 7
work page 2021
-
[61]
Cddfuse: Correlation-driven dual-branch feature decompo- sition for multi-modality image fusion
Zixiang Zhao, Haowen Bai, Jiangshe Zhang, Yulun Zhang, Shuang Xu, Zudi Lin, Radu Timofte, and Luc Van Gool. Cddfuse: Correlation-driven dual-branch feature decompo- sition for multi-modality image fusion. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5906–5916, 2023. 1, 6, 7
work page 2023
-
[62]
Ddfm: denoising diffusion model for multi-modality image fusion
Zixiang Zhao, Haowen Bai, Yuanzhi Zhu, Jiangshe Zhang, Shuang Xu, Yulun Zhang, Kai Zhang, Deyu Meng, Radu Timofte, and Luc Van Gool. Ddfm: denoising diffusion model for multi-modality image fusion. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 8082–8093, 2023. 6
work page 2023
-
[63]
Yanfeng Zhou, Jiaxing Huang, Chenlong Wang, Le Song, and Ge Yang. Xnet: Wavelet-based low and high frequency fusion networks for fully-and semi-supervised semantic seg- mentation of biomedical images. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 21085–21096, 2023. 3
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.