Enhancing Event-based Object Detection with Monocular Normal Maps
Pith reviewed 2026-05-22 12:21 UTC · model grok-4.3
The pith
RGB-derived surface normal maps supply geometric priors that improve event-based object detection under difficult lighting.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Surface normal maps extracted from monocular RGB images act as explicit geometric constraints that assist event-based object detection. The NRE-Net framework first aligns geometric and appearance cues with the Adaptive Dual-stream Fusion Module, then selectively integrates high-frequency event dynamics with the Event-modality Aware Fusion Module. This trimodal integration yields a 3.0% AP50 improvement over dual-modal baselines and outperforms SFNet and SODFormer on DSEC-Det-sub and PKU-DAVIS-SOD.
What carries the argument
NRE-Net trimodal network that uses the Adaptive Dual-stream Fusion Module to align normal maps with RGB and the Event-modality Aware Fusion Module to incorporate event information, with normal maps providing the structural priors.
If this is right
- Geometric priors from normals deliver an additional 3.0% AP50 over dual-modal event-plus-RGB baselines.
- The trimodal system outperforms SFNet by 2.7% and SODFormer by 7.1% on the evaluated autonomous-driving datasets.
- Normal maps help suppress misleading event signals triggered by reflections and contrast changes.
- The approach remains effective when RGB quality is reduced because the normals retain low-frequency structure.
Where Pith is reading between the lines
- The same low-frequency geometric priors could be tested in other event-based tasks such as segmentation or optical flow.
- Deriving normals from sources other than RGB might further increase robustness when RGB is unavailable.
- Real-time vehicle systems could use this fusion to maintain detection accuracy across wider ranges of lighting without requiring perfectly exposed RGB frames.
Load-bearing premise
RGB-derived surface normal maps preserve useful low-frequency structural information even when the source RGB image is degraded by illumination problems.
What would settle it
Measure AP50 on DSEC-Det-sub with and without the normal-map input branch; absence of a roughly 3% gain would contradict the central claim.
Figures
read the original abstract
Object detection in autonomous driving is frequently compromised by complex illumination. While event cameras offer a robust solution, they are susceptible to sudden contrast changes such as reflections which often trigger dense, misleading event signals. To overcome this, we leverage RGB-derived surface normal maps as explicit geometric constraints. Crucially, even when RGB degrades, they preserve low-frequency structural priors that effectively assist in event-based detection. Consequently, we present NRE-Net, a trimodal framework that integrates structural priors from surface Normal maps, appearance context from RGB images, and high-frequency dynamics from Events. The Adaptive Dual-stream Fusion Module (ADFM) first aligns geometric and appearance cues, followed by the Event-modality Aware Fusion Module (EAFM) which selectively integrates event dynamics. Extensive evaluations on DSEC-Det-sub and PKU-DAVIS-SOD demonstrate that incorporating geometric priors yields an additional 3.0% AP50 gain over dual-modal baselines, while our approach consistently outperforms fusion methods such as SFNet (+2.7%) and SODFormer (+7.1%).
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes NRE-Net, a trimodal framework for object detection in challenging illumination that fuses RGB-derived monocular surface normal maps (as geometric priors), RGB appearance, and event data. It introduces an Adaptive Dual-stream Fusion Module (ADFM) to align geometric and appearance cues followed by an Event-modality Aware Fusion Module (EAFM) for selective event integration. Experiments on DSEC-Det-sub and PKU-DAVIS-SOD report a 3.0% AP50 gain over dual-modal baselines and consistent outperformance of SFNet (+2.7%) and SODFormer (+7.1%).
Significance. If the central assumption holds, the work offers a practical route to leverage geometric priors for robust event-based detection when RGB degrades. The reported empirical gains on named datasets are concrete and address a real autonomous-driving pain point; however, the absence of direct validation on normal-map fidelity under the same adverse conditions that produce dense misleading events limits how strongly the results can be interpreted as evidence for the geometric-prior mechanism.
major comments (2)
- [Abstract and §4] Abstract and §4 (Experiments): the headline claim that 'even when RGB degrades, [normals] preserve low-frequency structural priors' is load-bearing for the entire contribution, yet the manuscript supplies no quantitative check (e.g., normal estimation error or cosine similarity) on the degraded illumination/reflection subsets of DSEC-Det-sub or PKU-DAVIS-SOD. Without this, it is impossible to attribute the 3.0% AP50 uplift specifically to the geometric signal rather than to other fusion effects.
- [§3.2] §3.2 (ADFM and EAFM): the modules are presented as selectively exploiting the normal priors, but no ablation or robustness analysis is given for the case in which the monocular estimator produces inaccurate normals under the same illumination changes that trigger dense events. This leaves open whether the reported gains would survive realistic normal-map noise.
minor comments (2)
- [Abstract] The abstract states concrete percentage improvements but omits any mention of the normal-map estimator used or the training protocol; adding one sentence would improve reproducibility assessment.
- [Figure 2] Figure captions for the fusion-module diagrams would benefit from explicit notation of input/output tensor shapes to match the text description.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and for recognizing the practical relevance of leveraging geometric priors in event-based detection under challenging illumination. We address each major comment below and will incorporate revisions to strengthen the empirical support for our claims.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and §4 (Experiments): the headline claim that 'even when RGB degrades, [normals] preserve low-frequency structural priors' is load-bearing for the entire contribution, yet the manuscript supplies no quantitative check (e.g., normal estimation error or cosine similarity) on the degraded illumination/reflection subsets of DSEC-Det-sub or PKU-DAVIS-SOD. Without this, it is impossible to attribute the 3.0% AP50 uplift specifically to the geometric signal rather than to other fusion effects.
Authors: We agree that a direct quantitative assessment of normal-map fidelity on the adverse subsets would allow stronger attribution of the observed gains to the geometric priors. In the revised manuscript we will add a new analysis (new table and discussion in §4) that reports proxy measures of normal quality—such as consistency with depth-derived normals where available and visual inspection of low-frequency structure preservation—on the illumination-degraded and reflection-heavy subsets of both datasets. We will also correlate these observations with the per-scene detection improvements to better isolate the contribution of the normal stream. revision: yes
-
Referee: [§3.2] §3.2 (ADFM and EAFM): the modules are presented as selectively exploiting the normal priors, but no ablation or robustness analysis is given for the case in which the monocular estimator produces inaccurate normals under the same illumination changes that trigger dense events. This leaves open whether the reported gains would survive realistic normal-map noise.
Authors: This is a fair point on the robustness of the proposed fusion modules. We will add a dedicated ablation study in the revised §4 that injects controlled noise (Gaussian perturbations at varying levels) into the input normal maps and re-evaluates ADFM and EAFM performance. The results will quantify how detection accuracy degrades as normal quality decreases and will demonstrate that the selective integration mechanisms in both modules retain benefit even under moderate normal-map inaccuracies. revision: yes
Circularity Check
No circularity: purely empirical framework with no derivations or self-referential predictions
full rationale
The paper introduces NRE-Net as a trimodal fusion architecture (Normal + RGB + Events) with modules ADFM and EAFM, but presents no equations, first-principles derivations, or parameter-fitting steps that could reduce to inputs by construction. Performance claims (e.g., +3.0% AP50) are framed exclusively as outcomes of experiments on DSEC-Det-sub and PKU-DAVIS-SOD. No self-citation load-bearing, uniqueness theorems, or ansatz smuggling appear in the provided text; the geometric-prior assumption is stated as a hypothesis validated by results rather than defined into existence. This is a standard empirical CV paper whose central claims remain externally falsifiable via the reported benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption RGB-derived surface normal maps preserve low-frequency structural priors that effectively assist in event-based detection even when RGB degrades.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Adaptive Dual-stream Fusion Module (ADFM) … cross-attention map … Event-modality Aware Fusion Module (EAFM) … spatial weighting and group normalization
-
IndisputableMonolith/Foundation/DimensionForcing.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
monocularly predicted surface Normal maps … preserve low-frequency structural priors
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 3 Pith papers
-
RE-VLM: Event-Augmented Vision-Language Model for Scene Understanding
RE-VLM is the first dual-stream VLM combining RGB and event data with a graph-based pipeline to generate training captions and QA pairs, showing gains over RGB-only and event-only models on new datasets for challengin...
-
Sparse Hypergraph-Enhanced Frame-Event Object Detection with Fine-Grained MoE
Hyper-FEOD fuses RGB and event data via sparse hypergraph cross-modal fusion and region-specialized MoE experts to improve accuracy-efficiency in object detection.
-
RE-VLM: Event-Augmented Vision-Language Model for Scene Understanding
RE-VLM fuses RGB and event data in a dual-stream VLM with a graph-based pipeline for generating training captions and QA pairs, plus two new datasets, showing gains over RGB-only and event-only baselines especially in...
Reference graph
Works this paper leans on
-
[1]
Rethinking induc- tive biases for surface normal estimation
Gwangbin Bae and Andrew J Davison. Rethinking induc- tive biases for surface normal estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9535–9545, 2024. 2, 4, 8
work page 2024
-
[2]
Iron- depth: Iterative refinement of single-view depth using surface normal and its uncertainty
Gwangbin Bae, Ignas Budvytis, and Roberto Cipolla. Iron- depth: Iterative refinement of single-view depth using surface normal and its uncertainty. arXiv preprint arXiv:2210.03676,
-
[3]
Phantom braking in automated 8 vehicles: A theoretical outline and cycling simulator demon- stration
Siri Hegna Berge, JCF de Winter, Yan Feng, MP Hagenzieker, and Marjan Hagenzieker. Phantom braking in automated 8 vehicles: A theoretical outline and cycling simulator demon- stration. 2024. 2
work page 2024
-
[4]
Jiahang Cao, Xu Zheng, Yuanhuiyi Lyu, Jiaxu Wang, Renjing Xu, and Lin Wang. Chasing day and night: Towards robust and efficient all-day object detection guided by an event cam- era. In 2024 IEEE International Conference on Robotics and Automation (ICRA), pages 9026–9032. IEEE, 2024. 3, 6
work page 2024
-
[5]
Nicholas FY Chen. Pseudo-labels for supervised learning on dynamic vision sensor data, applied to object detection under ego-motion. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops , pages 644–653, 2018. 3
work page 2018
-
[6]
A large scale event-based detection dataset for automotive,
Pierre De Tournemire, Davide Nitti, Etienne Perot, Davide Migliore, and Amos Sironi. A large scale event-based detec- tion dataset for automotive. arXiv preprint arXiv:2001.08499,
-
[7]
Di Feng, Christian Haase-Sch ¨utz, Lars Rosenbaum, Heinz Hertlein, Claudius Glaeser, Fabian Timm, Werner Wiesbeck, and Klaus Dietmayer. Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges. IEEE Transactions on Intelligent Transportation Systems, 22(3):1341–1360, 2020. 2
work page 2020
-
[8]
Geowiz- ard: Unleashing the diffusion priors for 3d geometry esti- mation from a single image
Xiao Fu, Wei Yin, Mu Hu, Kaixuan Wang, Yuexin Ma, Ping Tan, Shaojie Shen, Dahua Lin, and Xiaoxiao Long. Geowiz- ard: Unleashing the diffusion priors for 3d geometry esti- mation from a single image. In European Conference on Computer Vision, pages 241–258. Springer, 2024. 2
work page 2024
-
[9]
YOLOX: Exceeding YOLO Series in 2021
Zheng Ge, Songtao Liu, Feng Wang, Zeming Li, and Jian Sun. Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430, 2021. 6
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[10]
Recurrent vision transformers for object detection with event cameras
Mathias Gehrig and Davide Scaramuzza. Recurrent vision transformers for object detection with event cameras. In Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 6
work page 2023
-
[11]
Recurrent vision transformers for object detection with event cameras
Mathias Gehrig and Davide Scaramuzza. Recurrent vision transformers for object detection with event cameras. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13884–13893, 2023. 2
work page 2023
-
[12]
Dsec: A stereo event camera dataset for driving scenarios
Mathias Gehrig, Willem Aarents, Daniel Gehrig, and Davide Scaramuzza. Dsec: A stereo event camera dataset for driving scenarios. IEEE Robotics and Automation Letters, 2021. 5
work page 2021
-
[13]
T. Haines and Richard C. Wilson. Combining shape-from- shading and stereo using gaussian-markov random fields. 2008 19th International Conference on Pattern Recognition, pages 1–4, 2008. 2
work page 2008
-
[14]
Shuhei Hashimoto, D. Miyazaki, and S. Hiura. Uncalibrated photometric stereo constrained by intrinsic reflectance image and shape from silhoutte. 2019 16th International Conference on Machine Vision Applications (MVA), pages 1–6, 2019. 2
work page 2019
-
[15]
Junjie Hu, Mete Ozay, Yan Zhang, and Takayuki Okatani. Revisiting single image depth estimation: Toward higher res- olution maps with accurate object boundaries. In 2019 IEEE winter conference on applications of computer vision (WACV), pages 1043–1051. IEEE, 2019. 2
work page 2019
-
[16]
Glenn Jocher, Alex Stoken, Jirka Borovec, Liu Changyu, Adam Hogan, Laurentiu Diaconu, Jake Poznanski, Lijun Yu, Prashant Rai, Russ Ferriday, et al. ultralytics/yolov5: v3. 0. Zenodo, 2020. 6
work page 2020
-
[17]
Micah K. Johnson and E. Adelson. Shape estimation in natural illumination. CVPR 2011, pages 2553–2560, 2011. 2
work page 2011
-
[18]
Nor- mal assisted stereo depth estimation
Uday Kusupati, Shuo Cheng, Rui Chen, and Hao Su. Nor- mal assisted stereo depth estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition, pages 2189–2199, 2020. 2
work page 2020
-
[19]
Sodformer: Streaming object detection with transformer using events and frames
Dianze Li, Yonghong Tian, and Jianing Li. Sodformer: Streaming object detection with transformer using events and frames. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(11):14020–14037, 2023. 1, 2, 5, 6
work page 2023
-
[20]
Event-assisted low-light video object segmentation
Hebei Li, Jin Wang, Jiahui Yuan, Yue Li, Wenming Weng, Yansong Peng, Yueyi Zhang, Zhiwei Xiong, and Xiaoyan Sun. Event-assisted low-light video object segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3250–3259, 2024. 1
work page 2024
-
[21]
Event-based vision enhanced: A joint detection framework in autonomous driving
Jianing Li, Siwei Dong, Zhaofei Yu, Yonghong Tian, and Tiejun Huang. Event-based vision enhanced: A joint detection framework in autonomous driving. In 2019 ieee international conference on multimedia and expo (icme), pages 1396–1401. IEEE, 2019. 1, 3
work page 2019
-
[22]
Exploring plain vision transformer backbones for object de- tection
Yanghao Li, Hanzi Mao, Ross Girshick, and Kaiming He. Exploring plain vision transformer backbones for object de- tection. In European conference on computer vision, pages 280–296. Springer, 2022. 1
work page 2022
-
[23]
Focal loss for dense object detection
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll´ar. Focal loss for dense object detection. In Pro- ceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017. 6
work page 2017
-
[24]
Motion robust high-speed light-weighted object detection with event camera
Bingde Liu, Chang Xu, Wen Yang, Huai Yu, and Lei Yu. Motion robust high-speed light-weighted object detection with event camera. IEEE Transactions on Instrumentation and Measurement, 72:1–13, 2023. 2
work page 2023
-
[25]
An atten- tion fusion network for event-based vehicle object detection
Mengyun Liu, Na Qi, Yunhui Shi, and Baocai Yin. An atten- tion fusion network for event-based vehicle object detection. In 2021 IEEE International Conference on Image Processing (ICIP), pages 3363–3367. IEEE, 2021. 3
work page 2021
-
[26]
Enhancing traffic object detection in variable illumination with rgb-event fusion
Zhanwen Liu, Nan Yang, Yang Wang, Yuke Li, Xiangmo Zhao, and Fei-Yue Wang. Enhancing traffic object detection in variable illumination with rgb-event fusion. IEEE Trans- actions on Intelligent Transportation Systems, 2024. 1, 2, 5, 6
work page 2024
-
[27]
Wonder3d: Single im- age to 3d using cross-domain diffusion
Xiaoxiao Long, Yuan-Chen Guo, Cheng Lin, Yuan Liu, Zhiyang Dou, Lingjie Liu, Yuexin Ma, Song-Hai Zhang, Marc Habermann, Christian Theobalt, et al. Wonder3d: Single im- age to 3d using cross-domain diffusion. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9970–9980, 2024. 2
work page 2024
-
[28]
Multi-bracket high dynamic range imaging with event cameras
Nico Messikommer, Stamatios Georgoulis, Daniel Gehrig, Stepan Tulyakov, Julius Erbach, Alfredo Bochicchio, Yuanyou Li, and Davide Scaramuzza. Multi-bracket high dynamic range imaging with event cameras. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 547–557, 2022. 3
work page 2022
-
[29]
3d object detection with normal-map on point clouds
Jishu Miao, Tsubasa Hirakawa, Takayoshi Yamashita, and Hironobu Fujiyoshi. 3d object detection with normal-map on point clouds. In VISIGRAPP (5: VISAPP), pages 569–576,
-
[30]
Phantom braking in advanced driver assistance systems
Claudia Trinidad Moscoso Paredes, Trond Foss, and Gun- nar Jenssen. Phantom braking in advanced driver assistance systems. driver experience and car manufacturer warnings in owner manuals. SINTEF rapport; 2021: 00482, 2021. 2
work page 2021
-
[31]
Manasi Muglikar, Diederik Paul Moeys, and D. Scaramuzza. Event guided depth sensing. 2021 International Conference on 3D Vision (3DV), pages 385–393, 2021. 2
work page 2021
-
[32]
Robust method for removing dynamic objects from point clouds
Shishir Pagad, Divya Agarwal, Sathya Narayanan, Kasturi Rangan, Hyungjin Kim, and Ganesh Yalla. Robust method for removing dynamic objects from point clouds. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 10765–10771. IEEE, 2020. 2
work page 2020
-
[33]
Learning to detect objects with a 1 megapixel event camera
Etienne Perot, Pierre De Tournemire, Davide Nitti, Jonathan Masci, and Amos Sironi. Learning to detect objects with a 1 megapixel event camera. Advances in Neural Information Processing Systems, 33:16639–16652, 2020. 5
work page 2020
-
[34]
So, Jun Hwangbo, Sang Hyun Kim, and I
J. So, Jun Hwangbo, Sang Hyun Kim, and I. Yun. Analysis on autonomous vehicle detection performance according to various road geometry settings. Journal of Intelligent Trans- portation Systems, 27:384 – 395, 2022. 2
work page 2022
-
[35]
Event-based fusion for motion deblurring with cross-modal attention
Lei Sun, Christos Sakaridis, Jingyun Liang, Qi Jiang, Kailun Yang, Peng Sun, Yaozu Ye, Kaiwei Wang, and Luc Van Gool. Event-based fusion for motion deblurring with cross-modal attention. In European conference on computer vision, pages 412–428. Springer, 2022. 3
work page 2022
-
[36]
Fusing event- based and rgb camera for robust object detection in adverse conditions
Abhishek Tomy, Anshul Paigwar, Khushdeep S Mann, Alessandro Renzaglia, and Christian Laugier. Fusing event- based and rgb camera for robust object detection in adverse conditions. In 2022 International conference on robotics and automation (ICRA), pages 933–939. IEEE, 2022. 3, 6
work page 2022
-
[37]
Depth estimation from image structure
Antonio Torralba and Aude Oliva. Depth estimation from image structure. IEEE Transactions on pattern analysis and machine intelligence, 24(9):1226–1238, 2002. 2
work page 2002
-
[38]
Stepan Tulyakov, Alfredo Bochicchio, Daniel Gehrig, Sta- matios Georgoulis, Yuanyou Li, and Davide Scaramuzza. Time lens++: Event-based frame interpolation with paramet- ric non-linear flow and multi-scale fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17755–17764, 2022. 3
work page 2022
-
[39]
Jonas Uhrig, Nick Schneider, Lukas Schneider, Uwe Franke, Thomas Brox, and Andreas Geiger. Sparsity invariant cnns. In International Conference on 3D Vision (3DV), 2017. 4
work page 2017
-
[40]
YOLOv7: Trainable bag-of-freebies sets new state- of-the-art for real-time object detectors
Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. YOLOv7: Trainable bag-of-freebies sets new state- of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 6
work page 2023
-
[41]
Dual memory aggregation network for event-based object detection with learnable representation
Dongsheng Wang, Xu Jia, Yang Zhang, Xinyu Zhang, Yaoyuan Wang, Ziyang Zhang, Dong Wang, and Huchuan Lu. Dual memory aggregation network for event-based object detection with learnable representation. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 2492–2500,
-
[42]
Fei-Yue Wang. Drive like a machine: Remembering the origin and goal of autonomous driving and intelligent vehicles.IEEE Transactions on Intelligent Vehicles, 8(7):3763–3766, 2023. 3
work page 2023
-
[43]
Kd-tree based nonuni- form simplification of 3d point cloud
Zhaoxia Xiao and Wenming Huang. Kd-tree based nonuni- form simplification of 3d point cloud. In 2009 Third Interna- tional Conference on Genetic and Evolutionary Computing, pages 339–342. IEEE, 2009. 2
work page 2009
-
[44]
Econ: Explicit clothed humans optimized via normal integration
Yuliang Xiu, Jinlong Yang, Xu Cao, Dimitrios Tzionas, and Michael J Black. Econ: Explicit clothed humans optimized via normal integration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 512–523, 2023. 2
work page 2023
-
[45]
Xiaoqing Ye, Mao Shu, Hanyu Li, Yifeng Shi, Yingying Li, Guangjie Wang, Xiao Tan, and Errui Ding. Rope3d: The road- side perception dataset for autonomous driving and monocular 3d object detection task. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 21341–21350, 2022. 1
work page 2022
-
[46]
arXiv preprint arXiv:2310.06347 , year=
Jingyang Zhang, Shiwei Li, Yuanxun Lu, Tian Fang, David McKinnon, Yanghai Tsin, Long Quan, and Yao Yao. Joint- net: Extending text-to-image diffusion for dense distribution modeling. arXiv preprint arXiv:2310.06347, 2023. 2
-
[47]
Completionformer: Depth completion with convolutions and vision transformers
Youmin Zhang, Xianda Guo, Matteo Poggi, Zheng Zhu, Guan Huang, and Stefano Mattoccia. Completionformer: Depth completion with convolutions and vision transformers. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18527–18536, 2023. 4
work page 2023
-
[48]
Mrpt: Millimeter-wave radar- based pedestrian trajectory tracking for autonomous urban driving
Zhenyuan Zhang, Xiaojie Wang, Darong Huang, Xin Fang, Mu Zhou, and Ying Zhang. Mrpt: Millimeter-wave radar- based pedestrian trajectory tracking for autonomous urban driving. IEEE Transactions on Instrumentation and Measure- ment, 71:1–17, 2021. 2
work page 2021
-
[49]
Detrs beat yolos on real-time object detection
Yian Zhao, Wenyu Lv, Shangliang Xu, Jinman Wei, Guanzhong Wang, Qingqing Dang, Yi Liu, and Jie Chen. Detrs beat yolos on real-time object detection. In Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16965–16974, 2024. 1
work page 2024
-
[50]
Wujie Zhou, Xinyang Lin, Jingsheng Lei, Lu Yu, and Jenq- Neng Hwang. Mffenet: Multiscale feature fusion and en- hancement network for rgb–thermal urban road scene parsing. IEEE Transactions on Multimedia, 24:2526–2538, 2021. 3
work page 2021
-
[51]
Rgb-event fusion for moving object detection in autonomous driving
Zhuyun Zhou, Zongwei Wu, R ´emi Boutteau, Fan Yang, C´edric Demonceaux, and Dominique Ginhac. Rgb-event fusion for moving object detection in autonomous driving. In 2023 IEEE International Conference on Robotics and Au- tomation (ICRA), pages 7808–7815. IEEE, 2023. 1, 2, 3, 6
work page 2023
-
[52]
Visual prompt multi-modal tracking
Jiawen Zhu, Simiao Lai, Xin Chen, Dong Wang, and Huchuan Lu. Visual prompt multi-modal tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9516–9526, 2023. 1
work page 2023
-
[53]
Nicer- slam: Neural implicit scene encoding for rgb slam
Zihan Zhu, Songyou Peng, Viktor Larsson, Zhaopeng Cui, Martin R Oswald, Andreas Geiger, and Marc Pollefeys. Nicer- slam: Neural implicit scene encoding for rgb slam. In 2024 International Conference on 3D Vision (3DV), pages 42–52. IEEE, 2024. 2
work page 2024
-
[54]
Object detection in 20 years: A survey
Zhengxia Zou, Keyan Chen, Zhenwei Shi, Yuhong Guo, and Jieping Ye. Object detection in 20 years: A survey. Proceed- ings of the IEEE, 111(3):257–276, 2023. 1 10
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.