Enhancing Event-based Object Detection with Monocular Normal Maps

Chuang Zhu; Hanqing Liu; Luoping Cui; Mingjie Liu

arxiv: 2508.02127 · v3 · pith:HY5TVU3Mnew · submitted 2025-08-04 · 💻 cs.CV

Enhancing Event-based Object Detection with Monocular Normal Maps

Mingjie Liu , Hanqing Liu , Luoping Cui , Chuang Zhu This is my paper

Pith reviewed 2026-05-22 12:21 UTC · model grok-4.3

classification 💻 cs.CV

keywords event-based object detectionsurface normal mapsmultimodal fusionautonomous drivinggeometric priorstrimodal networkNRE-Net

0 comments

The pith

RGB-derived surface normal maps supply geometric priors that improve event-based object detection under difficult lighting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Event cameras resist illumination changes but produce dense misleading signals from reflections and sudden contrast shifts. The authors derive surface normal maps from RGB images to supply stable low-frequency structural information that remains available even when RGB quality drops. They build NRE-Net, a trimodal network that fuses these normals with RGB appearance and event dynamics through two dedicated fusion modules. Experiments on driving datasets show the added priors deliver measurable gains over dual-modal and prior fusion baselines.

Core claim

Surface normal maps extracted from monocular RGB images act as explicit geometric constraints that assist event-based object detection. The NRE-Net framework first aligns geometric and appearance cues with the Adaptive Dual-stream Fusion Module, then selectively integrates high-frequency event dynamics with the Event-modality Aware Fusion Module. This trimodal integration yields a 3.0% AP50 improvement over dual-modal baselines and outperforms SFNet and SODFormer on DSEC-Det-sub and PKU-DAVIS-SOD.

What carries the argument

NRE-Net trimodal network that uses the Adaptive Dual-stream Fusion Module to align normal maps with RGB and the Event-modality Aware Fusion Module to incorporate event information, with normal maps providing the structural priors.

If this is right

Geometric priors from normals deliver an additional 3.0% AP50 over dual-modal event-plus-RGB baselines.
The trimodal system outperforms SFNet by 2.7% and SODFormer by 7.1% on the evaluated autonomous-driving datasets.
Normal maps help suppress misleading event signals triggered by reflections and contrast changes.
The approach remains effective when RGB quality is reduced because the normals retain low-frequency structure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same low-frequency geometric priors could be tested in other event-based tasks such as segmentation or optical flow.
Deriving normals from sources other than RGB might further increase robustness when RGB is unavailable.
Real-time vehicle systems could use this fusion to maintain detection accuracy across wider ranges of lighting without requiring perfectly exposed RGB frames.

Load-bearing premise

RGB-derived surface normal maps preserve useful low-frequency structural information even when the source RGB image is degraded by illumination problems.

What would settle it

Measure AP50 on DSEC-Det-sub with and without the normal-map input branch; absence of a roughly 3% gain would contradict the central claim.

Figures

Figures reproduced from arXiv: 2508.02127 by Chuang Zhu, Hanqing Liu, Luoping Cui, Mingjie Liu.

**Figure 2.** Figure 2: Pipeline of the proposed NRE-Net. (a) Three parallel branches extract complementary cues from RGB images, event streams, [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Qualitative comparison of detection results in challenging [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

read the original abstract

Object detection in autonomous driving is frequently compromised by complex illumination. While event cameras offer a robust solution, they are susceptible to sudden contrast changes such as reflections which often trigger dense, misleading event signals. To overcome this, we leverage RGB-derived surface normal maps as explicit geometric constraints. Crucially, even when RGB degrades, they preserve low-frequency structural priors that effectively assist in event-based detection. Consequently, we present NRE-Net, a trimodal framework that integrates structural priors from surface Normal maps, appearance context from RGB images, and high-frequency dynamics from Events. The Adaptive Dual-stream Fusion Module (ADFM) first aligns geometric and appearance cues, followed by the Event-modality Aware Fusion Module (EAFM) which selectively integrates event dynamics. Extensive evaluations on DSEC-Det-sub and PKU-DAVIS-SOD demonstrate that incorporating geometric priors yields an additional 3.0% AP50 gain over dual-modal baselines, while our approach consistently outperforms fusion methods such as SFNet (+2.7%) and SODFormer (+7.1%).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds monocular normal maps to event-RGB fusion for a few percent AP50 gain in tricky lighting, but leaves the reliability of those normals under degradation untested.

read the letter

The main point is that this work fuses event data with RGB and monocular normal maps to stabilize object detection when illumination or reflections create noisy events. They call the model NRE-Net and introduce ADFM to align geometric and appearance features first, then EAFM to selectively add event dynamics. On DSEC-Det-sub and PKU-DAVIS-SOD the trimodal version beats dual-modal baselines by roughly 3% AP50 and edges out SFNet and SODFormer by 2.7% and 7.1% respectively. That is the concrete result they report. The approach is a straightforward extension of existing multi-modal fusion rather than a new theoretical framing, but the choice to treat normals as persistent low-frequency structure is a reasonable practical move. The experiments are run on standard event datasets and the gains are stated plainly, which makes the empirical side easy to follow. The soft spot is exactly the one the stress-test flags. The normals are estimated from RGB, yet the paper gives no separate numbers on how accurate those normal maps remain on the degraded portions of the data. If the estimator was trained mostly on clean images, its output could degrade at the same moments the events become misleading, which would undercut the geometric prior the fusion modules are meant to exploit. Without that check or an ablation that isolates normal quality, the source of the reported lift stays partly opaque. This is the kind of targeted engineering paper that reading groups in event-based vision or autonomous-driving perception might discuss for the fusion modules and dataset results. It is not foundational, but the problem it targets is real and the numbers are on the table. I would send it to referees so they can ask for the missing normal-map diagnostics and any extra controls on the fusion stages.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes NRE-Net, a trimodal framework for object detection in challenging illumination that fuses RGB-derived monocular surface normal maps (as geometric priors), RGB appearance, and event data. It introduces an Adaptive Dual-stream Fusion Module (ADFM) to align geometric and appearance cues followed by an Event-modality Aware Fusion Module (EAFM) for selective event integration. Experiments on DSEC-Det-sub and PKU-DAVIS-SOD report a 3.0% AP50 gain over dual-modal baselines and consistent outperformance of SFNet (+2.7%) and SODFormer (+7.1%).

Significance. If the central assumption holds, the work offers a practical route to leverage geometric priors for robust event-based detection when RGB degrades. The reported empirical gains on named datasets are concrete and address a real autonomous-driving pain point; however, the absence of direct validation on normal-map fidelity under the same adverse conditions that produce dense misleading events limits how strongly the results can be interpreted as evidence for the geometric-prior mechanism.

major comments (2)

[Abstract and §4] Abstract and §4 (Experiments): the headline claim that 'even when RGB degrades, [normals] preserve low-frequency structural priors' is load-bearing for the entire contribution, yet the manuscript supplies no quantitative check (e.g., normal estimation error or cosine similarity) on the degraded illumination/reflection subsets of DSEC-Det-sub or PKU-DAVIS-SOD. Without this, it is impossible to attribute the 3.0% AP50 uplift specifically to the geometric signal rather than to other fusion effects.
[§3.2] §3.2 (ADFM and EAFM): the modules are presented as selectively exploiting the normal priors, but no ablation or robustness analysis is given for the case in which the monocular estimator produces inaccurate normals under the same illumination changes that trigger dense events. This leaves open whether the reported gains would survive realistic normal-map noise.

minor comments (2)

[Abstract] The abstract states concrete percentage improvements but omits any mention of the normal-map estimator used or the training protocol; adding one sentence would improve reproducibility assessment.
[Figure 2] Figure captions for the fusion-module diagrams would benefit from explicit notation of input/output tensor shapes to match the text description.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the practical relevance of leveraging geometric priors in event-based detection under challenging illumination. We address each major comment below and will incorporate revisions to strengthen the empirical support for our claims.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (Experiments): the headline claim that 'even when RGB degrades, [normals] preserve low-frequency structural priors' is load-bearing for the entire contribution, yet the manuscript supplies no quantitative check (e.g., normal estimation error or cosine similarity) on the degraded illumination/reflection subsets of DSEC-Det-sub or PKU-DAVIS-SOD. Without this, it is impossible to attribute the 3.0% AP50 uplift specifically to the geometric signal rather than to other fusion effects.

Authors: We agree that a direct quantitative assessment of normal-map fidelity on the adverse subsets would allow stronger attribution of the observed gains to the geometric priors. In the revised manuscript we will add a new analysis (new table and discussion in §4) that reports proxy measures of normal quality—such as consistency with depth-derived normals where available and visual inspection of low-frequency structure preservation—on the illumination-degraded and reflection-heavy subsets of both datasets. We will also correlate these observations with the per-scene detection improvements to better isolate the contribution of the normal stream. revision: yes
Referee: [§3.2] §3.2 (ADFM and EAFM): the modules are presented as selectively exploiting the normal priors, but no ablation or robustness analysis is given for the case in which the monocular estimator produces inaccurate normals under the same illumination changes that trigger dense events. This leaves open whether the reported gains would survive realistic normal-map noise.

Authors: This is a fair point on the robustness of the proposed fusion modules. We will add a dedicated ablation study in the revised §4 that injects controlled noise (Gaussian perturbations at varying levels) into the input normal maps and re-evaluates ADFM and EAFM performance. The results will quantify how detection accuracy degrades as normal quality decreases and will demonstrate that the selective integration mechanisms in both modules retain benefit even under moderate normal-map inaccuracies. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical framework with no derivations or self-referential predictions

full rationale

The paper introduces NRE-Net as a trimodal fusion architecture (Normal + RGB + Events) with modules ADFM and EAFM, but presents no equations, first-principles derivations, or parameter-fitting steps that could reduce to inputs by construction. Performance claims (e.g., +3.0% AP50) are framed exclusively as outcomes of experiments on DSEC-Det-sub and PKU-DAVIS-SOD. No self-citation load-bearing, uniqueness theorems, or ansatz smuggling appear in the provided text; the geometric-prior assumption is stated as a hypothesis validated by results rather than defined into existence. This is a standard empirical CV paper whose central claims remain externally falsifiable via the reported benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that monocular normal maps supply useful structural priors under RGB degradation; no explicit free parameters or invented physical entities are stated in the abstract.

axioms (1)

domain assumption RGB-derived surface normal maps preserve low-frequency structural priors that effectively assist in event-based detection even when RGB degrades.
Presented as the key reason normal maps help when event signals are misleading.

pith-pipeline@v0.9.0 · 5713 in / 1259 out tokens · 53676 ms · 2026-05-22T12:21:27.192143+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Adaptive Dual-stream Fusion Module (ADFM) … cross-attention map … Event-modality Aware Fusion Module (EAFM) … spatial weighting and group normalization
IndisputableMonolith/Foundation/DimensionForcing.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

monocularly predicted surface Normal maps … preserve low-frequency structural priors

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

RE-VLM: Event-Augmented Vision-Language Model for Scene Understanding
cs.CV 2026-05 unverdicted novelty 6.0

RE-VLM is the first dual-stream VLM combining RGB and event data with a graph-based pipeline to generate training captions and QA pairs, showing gains over RGB-only and event-only models on new datasets for challengin...
Sparse Hypergraph-Enhanced Frame-Event Object Detection with Fine-Grained MoE
cs.CV 2026-04 unverdicted novelty 6.0

Hyper-FEOD fuses RGB and event data via sparse hypergraph cross-modal fusion and region-specialized MoE experts to improve accuracy-efficiency in object detection.
RE-VLM: Event-Augmented Vision-Language Model for Scene Understanding
cs.CV 2026-05 unverdicted novelty 5.0

RE-VLM fuses RGB and event data in a dual-stream VLM with a graph-based pipeline for generating training captions and QA pairs, plus two new datasets, showing gains over RGB-only and event-only baselines especially in...

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages · cited by 2 Pith papers · 1 internal anchor

[1]

Rethinking induc- tive biases for surface normal estimation

Gwangbin Bae and Andrew J Davison. Rethinking induc- tive biases for surface normal estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9535–9545, 2024. 2, 4, 8

work page 2024
[2]

Iron- depth: Iterative refinement of single-view depth using surface normal and its uncertainty

Gwangbin Bae, Ignas Budvytis, and Roberto Cipolla. Iron- depth: Iterative refinement of single-view depth using surface normal and its uncertainty. arXiv preprint arXiv:2210.03676,

work page arXiv
[3]

Phantom braking in automated 8 vehicles: A theoretical outline and cycling simulator demon- stration

Siri Hegna Berge, JCF de Winter, Yan Feng, MP Hagenzieker, and Marjan Hagenzieker. Phantom braking in automated 8 vehicles: A theoretical outline and cycling simulator demon- stration. 2024. 2

work page 2024
[4]

Chasing day and night: Towards robust and efficient all-day object detection guided by an event cam- era

Jiahang Cao, Xu Zheng, Yuanhuiyi Lyu, Jiaxu Wang, Renjing Xu, and Lin Wang. Chasing day and night: Towards robust and efficient all-day object detection guided by an event cam- era. In 2024 IEEE International Conference on Robotics and Automation (ICRA), pages 9026–9032. IEEE, 2024. 3, 6

work page 2024
[5]

Pseudo-labels for supervised learning on dynamic vision sensor data, applied to object detection under ego-motion

Nicholas FY Chen. Pseudo-labels for supervised learning on dynamic vision sensor data, applied to object detection under ego-motion. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops , pages 644–653, 2018. 3

work page 2018
[6]

A large scale event-based detection dataset for automotive,

Pierre De Tournemire, Davide Nitti, Etienne Perot, Davide Migliore, and Amos Sironi. A large scale event-based detec- tion dataset for automotive. arXiv preprint arXiv:2001.08499,

work page arXiv 2001
[7]

Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges

Di Feng, Christian Haase-Sch ¨utz, Lars Rosenbaum, Heinz Hertlein, Claudius Glaeser, Fabian Timm, Werner Wiesbeck, and Klaus Dietmayer. Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges. IEEE Transactions on Intelligent Transportation Systems, 22(3):1341–1360, 2020. 2

work page 2020
[8]

Geowiz- ard: Unleashing the diffusion priors for 3d geometry esti- mation from a single image

Xiao Fu, Wei Yin, Mu Hu, Kaixuan Wang, Yuexin Ma, Ping Tan, Shaojie Shen, Dahua Lin, and Xiaoxiao Long. Geowiz- ard: Unleashing the diffusion priors for 3d geometry esti- mation from a single image. In European Conference on Computer Vision, pages 241–258. Springer, 2024. 2

work page 2024
[9]

YOLOX: Exceeding YOLO Series in 2021

Zheng Ge, Songtao Liu, Feng Wang, Zeming Li, and Jian Sun. Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430, 2021. 6

work page internal anchor Pith review Pith/arXiv arXiv 2021
[10]

Recurrent vision transformers for object detection with event cameras

Mathias Gehrig and Davide Scaramuzza. Recurrent vision transformers for object detection with event cameras. In Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 6

work page 2023
[11]

Recurrent vision transformers for object detection with event cameras

Mathias Gehrig and Davide Scaramuzza. Recurrent vision transformers for object detection with event cameras. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13884–13893, 2023. 2

work page 2023
[12]

Dsec: A stereo event camera dataset for driving scenarios

Mathias Gehrig, Willem Aarents, Daniel Gehrig, and Davide Scaramuzza. Dsec: A stereo event camera dataset for driving scenarios. IEEE Robotics and Automation Letters, 2021. 5

work page 2021
[13]

Haines and Richard C

T. Haines and Richard C. Wilson. Combining shape-from- shading and stereo using gaussian-markov random fields. 2008 19th International Conference on Pattern Recognition, pages 1–4, 2008. 2

work page 2008
[14]

Miyazaki, and S

Shuhei Hashimoto, D. Miyazaki, and S. Hiura. Uncalibrated photometric stereo constrained by intrinsic reflectance image and shape from silhoutte. 2019 16th International Conference on Machine Vision Applications (MVA), pages 1–6, 2019. 2

work page 2019
[15]

Revisiting single image depth estimation: Toward higher res- olution maps with accurate object boundaries

Junjie Hu, Mete Ozay, Yan Zhang, and Takayuki Okatani. Revisiting single image depth estimation: Toward higher res- olution maps with accurate object boundaries. In 2019 IEEE winter conference on applications of computer vision (WACV), pages 1043–1051. IEEE, 2019. 2

work page 2019
[16]

ultralytics/yolov5: v3

Glenn Jocher, Alex Stoken, Jirka Borovec, Liu Changyu, Adam Hogan, Laurentiu Diaconu, Jake Poznanski, Lijun Yu, Prashant Rai, Russ Ferriday, et al. ultralytics/yolov5: v3. 0. Zenodo, 2020. 6

work page 2020
[17]

Johnson and E

Micah K. Johnson and E. Adelson. Shape estimation in natural illumination. CVPR 2011, pages 2553–2560, 2011. 2

work page 2011
[18]

Nor- mal assisted stereo depth estimation

Uday Kusupati, Shuo Cheng, Rui Chen, and Hao Su. Nor- mal assisted stereo depth estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition, pages 2189–2199, 2020. 2

work page 2020
[19]

Sodformer: Streaming object detection with transformer using events and frames

Dianze Li, Yonghong Tian, and Jianing Li. Sodformer: Streaming object detection with transformer using events and frames. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(11):14020–14037, 2023. 1, 2, 5, 6

work page 2023
[20]

Event-assisted low-light video object segmentation

Hebei Li, Jin Wang, Jiahui Yuan, Yue Li, Wenming Weng, Yansong Peng, Yueyi Zhang, Zhiwei Xiong, and Xiaoyan Sun. Event-assisted low-light video object segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3250–3259, 2024. 1

work page 2024
[21]

Event-based vision enhanced: A joint detection framework in autonomous driving

Jianing Li, Siwei Dong, Zhaofei Yu, Yonghong Tian, and Tiejun Huang. Event-based vision enhanced: A joint detection framework in autonomous driving. In 2019 ieee international conference on multimedia and expo (icme), pages 1396–1401. IEEE, 2019. 1, 3

work page 2019
[22]

Exploring plain vision transformer backbones for object de- tection

Yanghao Li, Hanzi Mao, Ross Girshick, and Kaiming He. Exploring plain vision transformer backbones for object de- tection. In European conference on computer vision, pages 280–296. Springer, 2022. 1

work page 2022
[23]

Focal loss for dense object detection

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll´ar. Focal loss for dense object detection. In Pro- ceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017. 6

work page 2017
[24]

Motion robust high-speed light-weighted object detection with event camera

Bingde Liu, Chang Xu, Wen Yang, Huai Yu, and Lei Yu. Motion robust high-speed light-weighted object detection with event camera. IEEE Transactions on Instrumentation and Measurement, 72:1–13, 2023. 2

work page 2023
[25]

An atten- tion fusion network for event-based vehicle object detection

Mengyun Liu, Na Qi, Yunhui Shi, and Baocai Yin. An atten- tion fusion network for event-based vehicle object detection. In 2021 IEEE International Conference on Image Processing (ICIP), pages 3363–3367. IEEE, 2021. 3

work page 2021
[26]

Enhancing traffic object detection in variable illumination with rgb-event fusion

Zhanwen Liu, Nan Yang, Yang Wang, Yuke Li, Xiangmo Zhao, and Fei-Yue Wang. Enhancing traffic object detection in variable illumination with rgb-event fusion. IEEE Trans- actions on Intelligent Transportation Systems, 2024. 1, 2, 5, 6

work page 2024
[27]

Wonder3d: Single im- age to 3d using cross-domain diffusion

Xiaoxiao Long, Yuan-Chen Guo, Cheng Lin, Yuan Liu, Zhiyang Dou, Lingjie Liu, Yuexin Ma, Song-Hai Zhang, Marc Habermann, Christian Theobalt, et al. Wonder3d: Single im- age to 3d using cross-domain diffusion. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9970–9980, 2024. 2

work page 2024
[28]

Multi-bracket high dynamic range imaging with event cameras

Nico Messikommer, Stamatios Georgoulis, Daniel Gehrig, Stepan Tulyakov, Julius Erbach, Alfredo Bochicchio, Yuanyou Li, and Davide Scaramuzza. Multi-bracket high dynamic range imaging with event cameras. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 547–557, 2022. 3

work page 2022
[29]

3d object detection with normal-map on point clouds

Jishu Miao, Tsubasa Hirakawa, Takayoshi Yamashita, and Hironobu Fujiyoshi. 3d object detection with normal-map on point clouds. In VISIGRAPP (5: VISAPP), pages 569–576,

work page
[30]

Phantom braking in advanced driver assistance systems

Claudia Trinidad Moscoso Paredes, Trond Foss, and Gun- nar Jenssen. Phantom braking in advanced driver assistance systems. driver experience and car manufacturer warnings in owner manuals. SINTEF rapport; 2021: 00482, 2021. 2

work page 2021
[31]

Scaramuzza

Manasi Muglikar, Diederik Paul Moeys, and D. Scaramuzza. Event guided depth sensing. 2021 International Conference on 3D Vision (3DV), pages 385–393, 2021. 2

work page 2021
[32]

Robust method for removing dynamic objects from point clouds

Shishir Pagad, Divya Agarwal, Sathya Narayanan, Kasturi Rangan, Hyungjin Kim, and Ganesh Yalla. Robust method for removing dynamic objects from point clouds. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 10765–10771. IEEE, 2020. 2

work page 2020
[33]

Learning to detect objects with a 1 megapixel event camera

Etienne Perot, Pierre De Tournemire, Davide Nitti, Jonathan Masci, and Amos Sironi. Learning to detect objects with a 1 megapixel event camera. Advances in Neural Information Processing Systems, 33:16639–16652, 2020. 5

work page 2020
[34]

So, Jun Hwangbo, Sang Hyun Kim, and I

J. So, Jun Hwangbo, Sang Hyun Kim, and I. Yun. Analysis on autonomous vehicle detection performance according to various road geometry settings. Journal of Intelligent Trans- portation Systems, 27:384 – 395, 2022. 2

work page 2022
[35]

Event-based fusion for motion deblurring with cross-modal attention

Lei Sun, Christos Sakaridis, Jingyun Liang, Qi Jiang, Kailun Yang, Peng Sun, Yaozu Ye, Kaiwei Wang, and Luc Van Gool. Event-based fusion for motion deblurring with cross-modal attention. In European conference on computer vision, pages 412–428. Springer, 2022. 3

work page 2022
[36]

Fusing event- based and rgb camera for robust object detection in adverse conditions

Abhishek Tomy, Anshul Paigwar, Khushdeep S Mann, Alessandro Renzaglia, and Christian Laugier. Fusing event- based and rgb camera for robust object detection in adverse conditions. In 2022 International conference on robotics and automation (ICRA), pages 933–939. IEEE, 2022. 3, 6

work page 2022
[37]

Depth estimation from image structure

Antonio Torralba and Aude Oliva. Depth estimation from image structure. IEEE Transactions on pattern analysis and machine intelligence, 24(9):1226–1238, 2002. 2

work page 2002
[38]

Time lens++: Event-based frame interpolation with paramet- ric non-linear flow and multi-scale fusion

Stepan Tulyakov, Alfredo Bochicchio, Daniel Gehrig, Sta- matios Georgoulis, Yuanyou Li, and Davide Scaramuzza. Time lens++: Event-based frame interpolation with paramet- ric non-linear flow and multi-scale fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17755–17764, 2022. 3

work page 2022
[39]

Sparsity invariant cnns

Jonas Uhrig, Nick Schneider, Lukas Schneider, Uwe Franke, Thomas Brox, and Andreas Geiger. Sparsity invariant cnns. In International Conference on 3D Vision (3DV), 2017. 4

work page 2017
[40]

YOLOv7: Trainable bag-of-freebies sets new state- of-the-art for real-time object detectors

Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. YOLOv7: Trainable bag-of-freebies sets new state- of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 6

work page 2023
[41]

Dual memory aggregation network for event-based object detection with learnable representation

Dongsheng Wang, Xu Jia, Yang Zhang, Xinyu Zhang, Yaoyuan Wang, Ziyang Zhang, Dong Wang, and Huchuan Lu. Dual memory aggregation network for event-based object detection with learnable representation. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 2492–2500,

work page
[42]

Drive like a machine: Remembering the origin and goal of autonomous driving and intelligent vehicles.IEEE Transactions on Intelligent Vehicles, 8(7):3763–3766, 2023

Fei-Yue Wang. Drive like a machine: Remembering the origin and goal of autonomous driving and intelligent vehicles.IEEE Transactions on Intelligent Vehicles, 8(7):3763–3766, 2023. 3

work page 2023
[43]

Kd-tree based nonuni- form simplification of 3d point cloud

Zhaoxia Xiao and Wenming Huang. Kd-tree based nonuni- form simplification of 3d point cloud. In 2009 Third Interna- tional Conference on Genetic and Evolutionary Computing, pages 339–342. IEEE, 2009. 2

work page 2009
[44]

Econ: Explicit clothed humans optimized via normal integration

Yuliang Xiu, Jinlong Yang, Xu Cao, Dimitrios Tzionas, and Michael J Black. Econ: Explicit clothed humans optimized via normal integration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 512–523, 2023. 2

work page 2023
[45]

Rope3d: The road- side perception dataset for autonomous driving and monocular 3d object detection task

Xiaoqing Ye, Mao Shu, Hanyu Li, Yifeng Shi, Yingying Li, Guangjie Wang, Xiao Tan, and Errui Ding. Rope3d: The road- side perception dataset for autonomous driving and monocular 3d object detection task. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 21341–21350, 2022. 1

work page 2022
[46]

arXiv preprint arXiv:2310.06347 , year=

Jingyang Zhang, Shiwei Li, Yuanxun Lu, Tian Fang, David McKinnon, Yanghai Tsin, Long Quan, and Yao Yao. Joint- net: Extending text-to-image diffusion for dense distribution modeling. arXiv preprint arXiv:2310.06347, 2023. 2

work page arXiv 2023
[47]

Completionformer: Depth completion with convolutions and vision transformers

Youmin Zhang, Xianda Guo, Matteo Poggi, Zheng Zhu, Guan Huang, and Stefano Mattoccia. Completionformer: Depth completion with convolutions and vision transformers. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18527–18536, 2023. 4

work page 2023
[48]

Mrpt: Millimeter-wave radar- based pedestrian trajectory tracking for autonomous urban driving

Zhenyuan Zhang, Xiaojie Wang, Darong Huang, Xin Fang, Mu Zhou, and Ying Zhang. Mrpt: Millimeter-wave radar- based pedestrian trajectory tracking for autonomous urban driving. IEEE Transactions on Instrumentation and Measure- ment, 71:1–17, 2021. 2

work page 2021
[49]

Detrs beat yolos on real-time object detection

Yian Zhao, Wenyu Lv, Shangliang Xu, Jinman Wei, Guanzhong Wang, Qingqing Dang, Yi Liu, and Jie Chen. Detrs beat yolos on real-time object detection. In Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16965–16974, 2024. 1

work page 2024
[50]

Mffenet: Multiscale feature fusion and en- hancement network for rgb–thermal urban road scene parsing

Wujie Zhou, Xinyang Lin, Jingsheng Lei, Lu Yu, and Jenq- Neng Hwang. Mffenet: Multiscale feature fusion and en- hancement network for rgb–thermal urban road scene parsing. IEEE Transactions on Multimedia, 24:2526–2538, 2021. 3

work page 2021
[51]

Rgb-event fusion for moving object detection in autonomous driving

Zhuyun Zhou, Zongwei Wu, R ´emi Boutteau, Fan Yang, C´edric Demonceaux, and Dominique Ginhac. Rgb-event fusion for moving object detection in autonomous driving. In 2023 IEEE International Conference on Robotics and Au- tomation (ICRA), pages 7808–7815. IEEE, 2023. 1, 2, 3, 6

work page 2023
[52]

Visual prompt multi-modal tracking

Jiawen Zhu, Simiao Lai, Xin Chen, Dong Wang, and Huchuan Lu. Visual prompt multi-modal tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9516–9526, 2023. 1

work page 2023
[53]

Nicer- slam: Neural implicit scene encoding for rgb slam

Zihan Zhu, Songyou Peng, Viktor Larsson, Zhaopeng Cui, Martin R Oswald, Andreas Geiger, and Marc Pollefeys. Nicer- slam: Neural implicit scene encoding for rgb slam. In 2024 International Conference on 3D Vision (3DV), pages 42–52. IEEE, 2024. 2

work page 2024
[54]

Object detection in 20 years: A survey

Zhengxia Zou, Keyan Chen, Zhenwei Shi, Yuhong Guo, and Jieping Ye. Object detection in 20 years: A survey. Proceed- ings of the IEEE, 111(3):257–276, 2023. 1 10

work page 2023

[1] [1]

Rethinking induc- tive biases for surface normal estimation

Gwangbin Bae and Andrew J Davison. Rethinking induc- tive biases for surface normal estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9535–9545, 2024. 2, 4, 8

work page 2024

[2] [2]

Iron- depth: Iterative refinement of single-view depth using surface normal and its uncertainty

Gwangbin Bae, Ignas Budvytis, and Roberto Cipolla. Iron- depth: Iterative refinement of single-view depth using surface normal and its uncertainty. arXiv preprint arXiv:2210.03676,

work page arXiv

[3] [3]

Phantom braking in automated 8 vehicles: A theoretical outline and cycling simulator demon- stration

Siri Hegna Berge, JCF de Winter, Yan Feng, MP Hagenzieker, and Marjan Hagenzieker. Phantom braking in automated 8 vehicles: A theoretical outline and cycling simulator demon- stration. 2024. 2

work page 2024

[4] [4]

Chasing day and night: Towards robust and efficient all-day object detection guided by an event cam- era

Jiahang Cao, Xu Zheng, Yuanhuiyi Lyu, Jiaxu Wang, Renjing Xu, and Lin Wang. Chasing day and night: Towards robust and efficient all-day object detection guided by an event cam- era. In 2024 IEEE International Conference on Robotics and Automation (ICRA), pages 9026–9032. IEEE, 2024. 3, 6

work page 2024

[5] [5]

Pseudo-labels for supervised learning on dynamic vision sensor data, applied to object detection under ego-motion

Nicholas FY Chen. Pseudo-labels for supervised learning on dynamic vision sensor data, applied to object detection under ego-motion. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops , pages 644–653, 2018. 3

work page 2018

[6] [6]

A large scale event-based detection dataset for automotive,

Pierre De Tournemire, Davide Nitti, Etienne Perot, Davide Migliore, and Amos Sironi. A large scale event-based detec- tion dataset for automotive. arXiv preprint arXiv:2001.08499,

work page arXiv 2001

[7] [7]

Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges

Di Feng, Christian Haase-Sch ¨utz, Lars Rosenbaum, Heinz Hertlein, Claudius Glaeser, Fabian Timm, Werner Wiesbeck, and Klaus Dietmayer. Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges. IEEE Transactions on Intelligent Transportation Systems, 22(3):1341–1360, 2020. 2

work page 2020

[8] [8]

Geowiz- ard: Unleashing the diffusion priors for 3d geometry esti- mation from a single image

Xiao Fu, Wei Yin, Mu Hu, Kaixuan Wang, Yuexin Ma, Ping Tan, Shaojie Shen, Dahua Lin, and Xiaoxiao Long. Geowiz- ard: Unleashing the diffusion priors for 3d geometry esti- mation from a single image. In European Conference on Computer Vision, pages 241–258. Springer, 2024. 2

work page 2024

[9] [9]

YOLOX: Exceeding YOLO Series in 2021

Zheng Ge, Songtao Liu, Feng Wang, Zeming Li, and Jian Sun. Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430, 2021. 6

work page internal anchor Pith review Pith/arXiv arXiv 2021

[10] [10]

Recurrent vision transformers for object detection with event cameras

Mathias Gehrig and Davide Scaramuzza. Recurrent vision transformers for object detection with event cameras. In Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 6

work page 2023

[11] [11]

Recurrent vision transformers for object detection with event cameras

Mathias Gehrig and Davide Scaramuzza. Recurrent vision transformers for object detection with event cameras. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13884–13893, 2023. 2

work page 2023

[12] [12]

Dsec: A stereo event camera dataset for driving scenarios

Mathias Gehrig, Willem Aarents, Daniel Gehrig, and Davide Scaramuzza. Dsec: A stereo event camera dataset for driving scenarios. IEEE Robotics and Automation Letters, 2021. 5

work page 2021

[13] [13]

Haines and Richard C

T. Haines and Richard C. Wilson. Combining shape-from- shading and stereo using gaussian-markov random fields. 2008 19th International Conference on Pattern Recognition, pages 1–4, 2008. 2

work page 2008

[14] [14]

Miyazaki, and S

Shuhei Hashimoto, D. Miyazaki, and S. Hiura. Uncalibrated photometric stereo constrained by intrinsic reflectance image and shape from silhoutte. 2019 16th International Conference on Machine Vision Applications (MVA), pages 1–6, 2019. 2

work page 2019

[15] [15]

Revisiting single image depth estimation: Toward higher res- olution maps with accurate object boundaries

Junjie Hu, Mete Ozay, Yan Zhang, and Takayuki Okatani. Revisiting single image depth estimation: Toward higher res- olution maps with accurate object boundaries. In 2019 IEEE winter conference on applications of computer vision (WACV), pages 1043–1051. IEEE, 2019. 2

work page 2019

[16] [16]

ultralytics/yolov5: v3

Glenn Jocher, Alex Stoken, Jirka Borovec, Liu Changyu, Adam Hogan, Laurentiu Diaconu, Jake Poznanski, Lijun Yu, Prashant Rai, Russ Ferriday, et al. ultralytics/yolov5: v3. 0. Zenodo, 2020. 6

work page 2020

[17] [17]

Johnson and E

Micah K. Johnson and E. Adelson. Shape estimation in natural illumination. CVPR 2011, pages 2553–2560, 2011. 2

work page 2011

[18] [18]

Nor- mal assisted stereo depth estimation

Uday Kusupati, Shuo Cheng, Rui Chen, and Hao Su. Nor- mal assisted stereo depth estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition, pages 2189–2199, 2020. 2

work page 2020

[19] [19]

Sodformer: Streaming object detection with transformer using events and frames

Dianze Li, Yonghong Tian, and Jianing Li. Sodformer: Streaming object detection with transformer using events and frames. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(11):14020–14037, 2023. 1, 2, 5, 6

work page 2023

[20] [20]

Event-assisted low-light video object segmentation

Hebei Li, Jin Wang, Jiahui Yuan, Yue Li, Wenming Weng, Yansong Peng, Yueyi Zhang, Zhiwei Xiong, and Xiaoyan Sun. Event-assisted low-light video object segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3250–3259, 2024. 1

work page 2024

[21] [21]

Event-based vision enhanced: A joint detection framework in autonomous driving

Jianing Li, Siwei Dong, Zhaofei Yu, Yonghong Tian, and Tiejun Huang. Event-based vision enhanced: A joint detection framework in autonomous driving. In 2019 ieee international conference on multimedia and expo (icme), pages 1396–1401. IEEE, 2019. 1, 3

work page 2019

[22] [22]

Exploring plain vision transformer backbones for object de- tection

Yanghao Li, Hanzi Mao, Ross Girshick, and Kaiming He. Exploring plain vision transformer backbones for object de- tection. In European conference on computer vision, pages 280–296. Springer, 2022. 1

work page 2022

[23] [23]

Focal loss for dense object detection

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll´ar. Focal loss for dense object detection. In Pro- ceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017. 6

work page 2017

[24] [24]

Motion robust high-speed light-weighted object detection with event camera

Bingde Liu, Chang Xu, Wen Yang, Huai Yu, and Lei Yu. Motion robust high-speed light-weighted object detection with event camera. IEEE Transactions on Instrumentation and Measurement, 72:1–13, 2023. 2

work page 2023

[25] [25]

An atten- tion fusion network for event-based vehicle object detection

Mengyun Liu, Na Qi, Yunhui Shi, and Baocai Yin. An atten- tion fusion network for event-based vehicle object detection. In 2021 IEEE International Conference on Image Processing (ICIP), pages 3363–3367. IEEE, 2021. 3

work page 2021

[26] [26]

Enhancing traffic object detection in variable illumination with rgb-event fusion

Zhanwen Liu, Nan Yang, Yang Wang, Yuke Li, Xiangmo Zhao, and Fei-Yue Wang. Enhancing traffic object detection in variable illumination with rgb-event fusion. IEEE Trans- actions on Intelligent Transportation Systems, 2024. 1, 2, 5, 6

work page 2024

[27] [27]

Wonder3d: Single im- age to 3d using cross-domain diffusion

Xiaoxiao Long, Yuan-Chen Guo, Cheng Lin, Yuan Liu, Zhiyang Dou, Lingjie Liu, Yuexin Ma, Song-Hai Zhang, Marc Habermann, Christian Theobalt, et al. Wonder3d: Single im- age to 3d using cross-domain diffusion. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9970–9980, 2024. 2

work page 2024

[28] [28]

Multi-bracket high dynamic range imaging with event cameras

Nico Messikommer, Stamatios Georgoulis, Daniel Gehrig, Stepan Tulyakov, Julius Erbach, Alfredo Bochicchio, Yuanyou Li, and Davide Scaramuzza. Multi-bracket high dynamic range imaging with event cameras. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 547–557, 2022. 3

work page 2022

[29] [29]

3d object detection with normal-map on point clouds

Jishu Miao, Tsubasa Hirakawa, Takayoshi Yamashita, and Hironobu Fujiyoshi. 3d object detection with normal-map on point clouds. In VISIGRAPP (5: VISAPP), pages 569–576,

work page

[30] [30]

Phantom braking in advanced driver assistance systems

Claudia Trinidad Moscoso Paredes, Trond Foss, and Gun- nar Jenssen. Phantom braking in advanced driver assistance systems. driver experience and car manufacturer warnings in owner manuals. SINTEF rapport; 2021: 00482, 2021. 2

work page 2021

[31] [31]

Scaramuzza

Manasi Muglikar, Diederik Paul Moeys, and D. Scaramuzza. Event guided depth sensing. 2021 International Conference on 3D Vision (3DV), pages 385–393, 2021. 2

work page 2021

[32] [32]

Robust method for removing dynamic objects from point clouds

Shishir Pagad, Divya Agarwal, Sathya Narayanan, Kasturi Rangan, Hyungjin Kim, and Ganesh Yalla. Robust method for removing dynamic objects from point clouds. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 10765–10771. IEEE, 2020. 2

work page 2020

[33] [33]

Learning to detect objects with a 1 megapixel event camera

Etienne Perot, Pierre De Tournemire, Davide Nitti, Jonathan Masci, and Amos Sironi. Learning to detect objects with a 1 megapixel event camera. Advances in Neural Information Processing Systems, 33:16639–16652, 2020. 5

work page 2020

[34] [34]

So, Jun Hwangbo, Sang Hyun Kim, and I

J. So, Jun Hwangbo, Sang Hyun Kim, and I. Yun. Analysis on autonomous vehicle detection performance according to various road geometry settings. Journal of Intelligent Trans- portation Systems, 27:384 – 395, 2022. 2

work page 2022

[35] [35]

Event-based fusion for motion deblurring with cross-modal attention

Lei Sun, Christos Sakaridis, Jingyun Liang, Qi Jiang, Kailun Yang, Peng Sun, Yaozu Ye, Kaiwei Wang, and Luc Van Gool. Event-based fusion for motion deblurring with cross-modal attention. In European conference on computer vision, pages 412–428. Springer, 2022. 3

work page 2022

[36] [36]

Fusing event- based and rgb camera for robust object detection in adverse conditions

Abhishek Tomy, Anshul Paigwar, Khushdeep S Mann, Alessandro Renzaglia, and Christian Laugier. Fusing event- based and rgb camera for robust object detection in adverse conditions. In 2022 International conference on robotics and automation (ICRA), pages 933–939. IEEE, 2022. 3, 6

work page 2022

[37] [37]

Depth estimation from image structure

Antonio Torralba and Aude Oliva. Depth estimation from image structure. IEEE Transactions on pattern analysis and machine intelligence, 24(9):1226–1238, 2002. 2

work page 2002

[38] [38]

Time lens++: Event-based frame interpolation with paramet- ric non-linear flow and multi-scale fusion

Stepan Tulyakov, Alfredo Bochicchio, Daniel Gehrig, Sta- matios Georgoulis, Yuanyou Li, and Davide Scaramuzza. Time lens++: Event-based frame interpolation with paramet- ric non-linear flow and multi-scale fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17755–17764, 2022. 3

work page 2022

[39] [39]

Sparsity invariant cnns

Jonas Uhrig, Nick Schneider, Lukas Schneider, Uwe Franke, Thomas Brox, and Andreas Geiger. Sparsity invariant cnns. In International Conference on 3D Vision (3DV), 2017. 4

work page 2017

[40] [40]

YOLOv7: Trainable bag-of-freebies sets new state- of-the-art for real-time object detectors

Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. YOLOv7: Trainable bag-of-freebies sets new state- of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 6

work page 2023

[41] [41]

Dual memory aggregation network for event-based object detection with learnable representation

Dongsheng Wang, Xu Jia, Yang Zhang, Xinyu Zhang, Yaoyuan Wang, Ziyang Zhang, Dong Wang, and Huchuan Lu. Dual memory aggregation network for event-based object detection with learnable representation. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 2492–2500,

work page

[42] [42]

Drive like a machine: Remembering the origin and goal of autonomous driving and intelligent vehicles.IEEE Transactions on Intelligent Vehicles, 8(7):3763–3766, 2023

Fei-Yue Wang. Drive like a machine: Remembering the origin and goal of autonomous driving and intelligent vehicles.IEEE Transactions on Intelligent Vehicles, 8(7):3763–3766, 2023. 3

work page 2023

[43] [43]

Kd-tree based nonuni- form simplification of 3d point cloud

Zhaoxia Xiao and Wenming Huang. Kd-tree based nonuni- form simplification of 3d point cloud. In 2009 Third Interna- tional Conference on Genetic and Evolutionary Computing, pages 339–342. IEEE, 2009. 2

work page 2009

[44] [44]

Econ: Explicit clothed humans optimized via normal integration

Yuliang Xiu, Jinlong Yang, Xu Cao, Dimitrios Tzionas, and Michael J Black. Econ: Explicit clothed humans optimized via normal integration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 512–523, 2023. 2

work page 2023

[45] [45]

Rope3d: The road- side perception dataset for autonomous driving and monocular 3d object detection task

Xiaoqing Ye, Mao Shu, Hanyu Li, Yifeng Shi, Yingying Li, Guangjie Wang, Xiao Tan, and Errui Ding. Rope3d: The road- side perception dataset for autonomous driving and monocular 3d object detection task. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 21341–21350, 2022. 1

work page 2022

[46] [46]

arXiv preprint arXiv:2310.06347 , year=

Jingyang Zhang, Shiwei Li, Yuanxun Lu, Tian Fang, David McKinnon, Yanghai Tsin, Long Quan, and Yao Yao. Joint- net: Extending text-to-image diffusion for dense distribution modeling. arXiv preprint arXiv:2310.06347, 2023. 2

work page arXiv 2023

[47] [47]

Completionformer: Depth completion with convolutions and vision transformers

Youmin Zhang, Xianda Guo, Matteo Poggi, Zheng Zhu, Guan Huang, and Stefano Mattoccia. Completionformer: Depth completion with convolutions and vision transformers. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18527–18536, 2023. 4

work page 2023

[48] [48]

Mrpt: Millimeter-wave radar- based pedestrian trajectory tracking for autonomous urban driving

Zhenyuan Zhang, Xiaojie Wang, Darong Huang, Xin Fang, Mu Zhou, and Ying Zhang. Mrpt: Millimeter-wave radar- based pedestrian trajectory tracking for autonomous urban driving. IEEE Transactions on Instrumentation and Measure- ment, 71:1–17, 2021. 2

work page 2021

[49] [49]

Detrs beat yolos on real-time object detection

Yian Zhao, Wenyu Lv, Shangliang Xu, Jinman Wei, Guanzhong Wang, Qingqing Dang, Yi Liu, and Jie Chen. Detrs beat yolos on real-time object detection. In Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16965–16974, 2024. 1

work page 2024

[50] [50]

Mffenet: Multiscale feature fusion and en- hancement network for rgb–thermal urban road scene parsing

Wujie Zhou, Xinyang Lin, Jingsheng Lei, Lu Yu, and Jenq- Neng Hwang. Mffenet: Multiscale feature fusion and en- hancement network for rgb–thermal urban road scene parsing. IEEE Transactions on Multimedia, 24:2526–2538, 2021. 3

work page 2021

[51] [51]

Rgb-event fusion for moving object detection in autonomous driving

Zhuyun Zhou, Zongwei Wu, R ´emi Boutteau, Fan Yang, C´edric Demonceaux, and Dominique Ginhac. Rgb-event fusion for moving object detection in autonomous driving. In 2023 IEEE International Conference on Robotics and Au- tomation (ICRA), pages 7808–7815. IEEE, 2023. 1, 2, 3, 6

work page 2023

[52] [52]

Visual prompt multi-modal tracking

Jiawen Zhu, Simiao Lai, Xin Chen, Dong Wang, and Huchuan Lu. Visual prompt multi-modal tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9516–9526, 2023. 1

work page 2023

[53] [53]

Nicer- slam: Neural implicit scene encoding for rgb slam

Zihan Zhu, Songyou Peng, Viktor Larsson, Zhaopeng Cui, Martin R Oswald, Andreas Geiger, and Marc Pollefeys. Nicer- slam: Neural implicit scene encoding for rgb slam. In 2024 International Conference on 3D Vision (3DV), pages 42–52. IEEE, 2024. 2

work page 2024

[54] [54]

Object detection in 20 years: A survey

Zhengxia Zou, Keyan Chen, Zhenwei Shi, Yuhong Guo, and Jieping Ye. Object detection in 20 years: A survey. Proceed- ings of the IEEE, 111(3):257–276, 2023. 1 10

work page 2023