The Road Ahead in Autonomous Driving: The KITScenes Multimodal Dataset

Alexander Blumberg; Annika B\"atz; Carlos Fernandez; Christoph Stiller; Dominik Strutz; Fabian Immel; Fabian Konstantinidis; Felix Hauser; Frank Bieder; Gleb Stepanov

arxiv: 2606.02956 · v1 · pith:VCTEDVH3new · submitted 2026-06-01 · 💻 cs.CV · cs.LG· cs.RO

The Road Ahead in Autonomous Driving: The KITScenes Multimodal Dataset

Richard Schwarzkopf , Fabian Immel , Alexander Blumberg , Jonas Merkert , Nils Rack , Kaiwen Wang , Fabian Konstantinidis , Julian Truetsch

show 16 more authors

Carlos Fernandez Annika B\"atz Kevin R\"osch Marlon Steiner Willi Poh Yinzhe Shen Royden Wagner Felix Hauser Dominik Strutz Jaime Villa Gleb Stepanov Holger Caesar \"Omer \c{S}ahin Ta\c{s} Frank Bieder Jan-Hendrik Pauls Christoph Stiller

This is my paper

Pith reviewed 2026-06-28 14:36 UTC · model grok-4.3

classification 💻 cs.CV cs.LGcs.RO

keywords autonomous driving datasetHD mapsmultimodal sensorstraffic element mappingEuropean urban drivingspatial learning benchmarks

0 comments

The pith

A new multimodal dataset supplies the most complete HD maps of any public autonomous driving collection, with traffic elements placed in accurate 3D and fully connected.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents KITScenes Multimodal as a European dataset that pairs high-resolution cameras, long-range lidar, 4D radar, and precise localization with unusually detailed HD maps. It argues that earlier datasets fall short on map completeness and geographic variety, especially in cities with irregular layouts and mixed traffic. The authors state that their maps include every driving-relevant element such as traffic lights mapped in 3D at reprojection-accurate positions with full topological connectivity, and that this completeness has been checked through actual driving trials on open-source software. The work also defines four benchmarks that test spatial learning in embodied AI settings.

Core claim

Our HD maps are, to our knowledge, the most complete of any sensor dataset, validated through autonomous driving trials on open-source software. For the first time in a public dataset, all driving-relevant traffic elements, such as traffic lights, are mapped in 3D to a reprojection-accurate level with full topological connectivity.

What carries the argument

The HD maps that locate every driving-relevant traffic element in 3D at reprojection-accurate positions and maintain full topological connectivity among them.

If this is right

Algorithms for end-to-end driving can be evaluated against maps that contain every traffic element in connected 3D form.
Online HD map construction and long-range depth estimation benchmarks become available on data from irregular city layouts.
Novel view synthesis methods can be tested on synchronized multimodal recordings that include the new map layer.
Training sets gain coverage of mixed-traffic European streets that differ from grid-based collections.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the maps hold up under wider testing, planners could rely on them to simulate complex intersections that current datasets omit.
The sensor synchronization details might be reused to reduce timing errors in other multimodal collections.
Adding more cities with similar map density would allow direct measurement of how map completeness affects model generalization.

Load-bearing premise

The maps reach reprojection accuracy and complete connectivity because the sensor synchronization and mapping steps contain no systematic errors, an assumption supported only by the unshown details of the driving-trial validation.

What would settle it

A set of camera images in which the projected positions of mapped traffic lights deviate from their visible locations or a driving test that reveals missing road connections in the supplied topology.

Figures

Figures reproduced from arXiv: 2606.02956 by Alexander Blumberg, Annika B\"atz, Carlos Fernandez, Christoph Stiller, Dominik Strutz, Fabian Immel, Fabian Konstantinidis, Felix Hauser, Frank Bieder, Gleb Stepanov, Holger Caesar, Jaime Villa, Jan-Hendrik Pauls, Jonas Merkert, Julian Truetsch, Kaiwen Wang, Kevin R\"osch, Marlon Steiner, Nils Rack, \"Omer \c{S}ahin Ta\c{s}, Richard Schwarzkopf, Royden Wagner, Willi Poh, Yinzhe Shen.

**Figure 2.** Figure 2: KITScenes Multimodal Sensor Setup. Our sensor rack (left) is depicted along with nominal [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Spatial coverage for two KITScenes cities. The color indicates the number of poses within [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Historical SOTA progression of online HD map construction models and example online [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 7.** Figure 7: Qualitative comparison of monocular depth estimation methods. The corresponding non [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 8.** Figure 8: Example of lacking 3D geometric integrity in current NVS methods. The traffic sign in the shifted view on the right is inconsistent with its true 3D position shown by the reprojected bounding box [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗

**Figure 10.** Figure 10: The KITScenes recording vehicle with the sensor setup as roofmount. [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗

**Figure 11.** Figure 11: Example frames of a scenario in Frankfurt. The reprojected traffic lights and signs can be [PITH_FULL_IMAGE:figures/full_fig_p017_11.png] view at source ↗

**Figure 12.** Figure 12: The open-source autonomous driving stack Autoware [ [PITH_FULL_IMAGE:figures/full_fig_p018_12.png] view at source ↗

**Figure 13.** Figure 13: Binned label category statistics over Lanelet2 map elements. The left plot covers elements [PITH_FULL_IMAGE:figures/full_fig_p019_13.png] view at source ↗

**Figure 14.** Figure 14: Visualization of the split Definition for two KITScenes cities. The color indicates the split [PITH_FULL_IMAGE:figures/full_fig_p020_14.png] view at source ↗

**Figure 15.** Figure 15: HD map outlines for our maps in the four cities of Frankfurt, Karlsruhe, Sindelfingen and [PITH_FULL_IMAGE:figures/full_fig_p021_15.png] view at source ↗

**Figure 16.** Figure 16: Schematic overview of the topology prediction with a GNN for the map elements road [PITH_FULL_IMAGE:figures/full_fig_p022_16.png] view at source ↗

**Figure 17.** Figure 17: Ground truth HD maps (left) and MapQR-Topo predictions (right) for two scenes. Grey [PITH_FULL_IMAGE:figures/full_fig_p022_17.png] view at source ↗

**Figure 18.** Figure 18: Qualitative comparison of monocular depth estimation methods. The corresponding [PITH_FULL_IMAGE:figures/full_fig_p023_18.png] view at source ↗

**Figure 19.** Figure 19: Qualitative comparison of traffic-sign recall under lateral viewpoint shifts. (a), (i) [PITH_FULL_IMAGE:figures/full_fig_p026_19.png] view at source ↗

**Figure 20.** Figure 20: Per-horizon profiles of map-grounded safety and lane-compliance metrics for all evaluated [PITH_FULL_IMAGE:figures/full_fig_p027_20.png] view at source ↗

**Figure 21.** Figure 21: Additional qualitative end-to-end predictions on KITScenes Multimodal, complementing [PITH_FULL_IMAGE:figures/full_fig_p028_21.png] view at source ↗

read the original abstract

Existing autonomous driving datasets have enabled major progress, but fall short in sensor fidelity, map completeness, or geographic diversity. We present KITScenes Multimodal, a European dataset built around high-fidelity sensors and maps. Our fully synchronized sensor suite combines high-resolution global-shutter cameras, long-range lidar beyond 400m, 4D imaging radar, and redundant GNSS/INS localization. Our HD maps are, to our knowledge, the most complete of any sensor dataset, validated through autonomous driving trials on open-source software. For the first time in a public dataset, all driving-relevant traffic elements, such as traffic lights, are mapped in 3D to a reprojection-accurate level with full topological connectivity. Recorded in cities with irregular street layouts and mixed traffic modes, our dataset complements existing datasets by broadening the available geographic diversity. We also introduce four benchmarks, each advancing spatial learning for embodied AI: online HD map construction, long-range depth estimation, novel view synthesis, and end-to-end driving. Project page: https://kitscenes.com/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

KITScenes brings a new European multimodal dataset with long-range lidar, 4D radar, and detailed 3D HD maps, but the accuracy and completeness claims lack the quantitative backing needed to evaluate them.

read the letter

The core offering is a new public dataset recorded in European cities with irregular layouts, using a synchronized stack of high-res cameras, lidar beyond 400 m, 4D imaging radar, and GNSS/INS. It also supplies 3D topological maps of traffic elements and four benchmarks covering online map construction, long-range depth, novel view synthesis, and end-to-end driving.

The work fills a real gap by adding geographic diversity and sensor range that many existing collections lack, and the decision to release the data with defined tasks is useful for embodied AI research.

The main weakness is that the strongest assertions—maps being the most complete of any sensor dataset, traffic elements placed to reprojection-accurate 3D level with full topological connectivity, and validation through open-source driving trials—are presented without error statistics, reprojection thresholds, connectivity rates, or side-by-side numbers. The abstract alone does not let a reader judge whether the mapping pipeline avoided systematic problems.

This paper is for groups that need fresh training material for long-range perception or map learning in mixed-traffic settings. A reader working on those problems would get value from the sensor configuration and the benchmark definitions even if the map claims require closer inspection.

I would send it to peer review. Dataset releases with this sensor mix and task definitions are worth referee time once the validation details are checked.

Referee Report

1 major / 0 minor

Summary. The paper introduces KITScenes Multimodal, a new European autonomous driving dataset featuring a synchronized high-fidelity sensor suite (high-resolution global-shutter cameras, long-range lidar >400m, 4D imaging radar, redundant GNSS/INS) and claims the most complete HD maps of any public sensor dataset. These maps include all driving-relevant traffic elements (e.g., traffic lights) mapped in 3D to a reprojection-accurate level with full topological connectivity, validated via autonomous driving trials on open-source software. The dataset targets geographic diversity in irregular street layouts and mixed traffic, and introduces four benchmarks: online HD map construction, long-range depth estimation, novel view synthesis, and end-to-end driving.

Significance. If the HD map completeness and accuracy claims hold with supporting evidence, the dataset would meaningfully complement existing ones by expanding geographic coverage and providing uniquely detailed 3D topological maps, potentially enabling stronger progress on the four proposed benchmarks for embodied AI and spatial learning in autonomous driving.

major comments (1)

[Abstract] Abstract: The central claim that the HD maps are 'the most complete of any sensor dataset' and achieve 'reprojection-accurate level' 3D mapping of traffic elements with 'full topological connectivity' for the first time in a public dataset rests on validation through autonomous driving trials, yet no quantitative metrics (e.g., reprojection error thresholds, connectivity success rates, or error statistics) or pipeline details are provided to substantiate this. This directly undermines assessment of the 'most complete' and 'first time' assertions.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address the major comment regarding the substantiation of our HD map claims below, and commit to revisions that strengthen the paper without altering its core contributions.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that the HD maps are 'the most complete of any sensor dataset' and achieve 'reprojection-accurate level' 3D mapping of traffic elements with 'full topological connectivity' for the first time in a public dataset rests on validation through autonomous driving trials, yet no quantitative metrics (e.g., reprojection error thresholds, connectivity success rates, or error statistics) or pipeline details are provided to substantiate this. This directly undermines assessment of the 'most complete' and 'first time' assertions.

Authors: We agree that the abstract would be strengthened by explicit quantitative metrics and pipeline details to support the claims. The full manuscript describes the mapping process and real-world validation via autonomous driving trials on open-source software, but does not include specific error statistics. In the revised version, we will add a dedicated subsection (likely in Section 3 or 4) providing quantitative metrics such as mean reprojection error for mapped traffic elements (e.g., traffic lights), topological connectivity success rates, and error statistics across the dataset, along with an overview of the mapping pipeline. This will enable direct assessment of the completeness and accuracy claims relative to prior datasets. revision: yes

Circularity Check

0 steps flagged

No circularity: descriptive dataset paper with no derivations or predictions

full rationale

The paper is a dataset release announcement. Its central claims concern sensor suite completeness, HD map coverage, and the introduction of four benchmarks. No equations, fitted parameters, predictions, or derivation chains appear in the provided text. Claims of map accuracy and topological connectivity are presented as empirical outcomes of the collection process rather than results derived from prior fitted quantities or self-citations. Because no load-bearing mathematical step exists that could reduce to its own inputs, the circularity score is 0.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a dataset collection and benchmarking paper; it introduces no free parameters, mathematical axioms, or invented physical entities.

pith-pipeline@v0.9.1-grok · 5815 in / 1108 out tokens · 30074 ms · 2026-06-28T14:36:44.875166+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

71 extracted references · 10 canonical work pages

[1]

Vision meets robotics: The kitti dataset.The International Journal of Robotics Research, 32(11):1231–1237, 2013

Andreas Geiger, Philip Lenz, Raquel Urtasun, and Christoph Stiller. Vision meets robotics: The kitti dataset.The International Journal of Robotics Research, 32(11):1231–1237, 2013. doi: 10.1177/ 0278364913491297

2013
[2]

Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom

Holger Caesar, Varun Bankiti, Alex H. Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. nuscenes: A multimodal dataset for autonomous driving. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020

2020
[3]

Scalability in perception for autonomous driving: Waymo open dataset

Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, Vijay Vasudevan, Wei Han, Jiquan Ngiam, Hang Zhao, Aleksei Timofeev, Scott Ettinger, Maxim Krivokon, Amy Gao, Aditya Joshi, Yu Zhang, Jonathon Shlens, Zhifeng Chen, and Dragomir Anguelov. Scalability in perception...

2020
[4]

Qi, Yin Zhou, Zoey Yang, Aurélien Chouard, Pei Sun, Jiquan Ngiam, Vijay Vasudevan, Alexander McCauley, Jonathon Shlens, and Dragomir Anguelov

Scott Ettinger, Shuyang Cheng, Benjamin Caine, Chenxi Liu, Hang Zhao, Sabeek Pradhan, Yuning Chai, Ben Sapp, Charles R. Qi, Yin Zhou, Zoey Yang, Aurélien Chouard, Pei Sun, Jiquan Ngiam, Vijay Vasudevan, Alexander McCauley, Jonathon Shlens, and Dragomir Anguelov. Large scale interactive motion forecasting for autonomous driving: The waymo open motion datas...

2021
[5]

Man truckscenes: A multimodal dataset for autonomous trucking in diverse conditions

Felix Fent, Fabian Kuttenreich, Florian Ruch, Farija Rizwin, Stefan Juergens, Lorenz Lecher- mann, Christian Nissler, Andrea Perl, Ulrich V oll, Min Yan, and Markus Lienkamp. Man truckscenes: A multimodal dataset for autonomous trucking in diverse conditions. In A. Glober- son, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,A...

2024
[6]

Truck- drive: Long-range autonomous highway driving dataset, 2026

Filippo Ghilotti, Edoardo Palladin, Samuel Brucker, Adam Sigal, Mario Bijelic, and Felix Heide. Truck- drive: Long-range autonomous highway driving dataset, 2026. URL https://arxiv.org/abs/2603. 02413

2026
[7]

Zenseact open dataset: A large-scale and diverse multimodal dataset for autonomous driving

Mina Alibeigi, William Ljungbergh, Adam Tonderski, Georg Hess, Adam Lilja, Carl Lindstrom, Daria Motorniuk, Junsheng Fu, Jenny Widahl, and Christoffer Petersson. Zenseact open dataset: A large-scale and diverse multimodal dataset for autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2023
[8]

PhysicalAI-Autonomous-Vehicles

NVIDIA Corporation. PhysicalAI-Autonomous-Vehicles. https://huggingface.co/datasets/ nvidia/PhysicalAI-Autonomous-Vehicles, oct 2025. Accessed 2026-05-06, released 2025-10-28

2025
[9]

Lanelet2: A high-definition map framework for the future of automated driving

Fabian Poggenhans, Jan-Hendrik Pauls, Johannes Janosovits, Stefan Orf, Maximilian Naumann, Florian Kuhnt, and Matthias Mayr. Lanelet2: A high-definition map framework for the future of automated driving. In2018 21st International Conference on Intelligent Transportation Systems (ITSC), pages 1672–1679,
[10]

doi: 10.1109/ITSC.2018.8569929

work page doi:10.1109/itsc.2018.8569929 2018
[11]

Autoware

Autoware Foundation. Autoware. https://github.com/autowarefoundation/autoware. Accessed: 2026-05-02

2026
[12]

Argoverse 2: Next generation datasets for self-driving perception and forecasting

Benjamin Wilson, William Qi, Tanmay Agarwal, John Lambert, Jagjeet Singh, Siddhesh Khandelwal, Bowen Pan, Ratnesh Kumar, Andrew Hartnett, Jhony Kaesemodel Pontes, Deva Ramanan, Peter Carr, and James Hays. Argoverse 2: Next generation datasets for self-driving perception and forecasting. In Proceedings of the Neural Information Processing Systems Track on ...

2021
[13]

Evangelidis and Emmanouil Z

Xinyu Huang, Peng Wang, Xinjing Cheng, Dingfu Zhou, Qichuan Geng, and Ruigang Yang. The ApolloScape Open Dataset for Autonomous Driving and Its Application .IEEE Transactions on Pattern Analysis & Machine Intelligence, 42(10):2702–2719, October 2020. ISSN 1939-3539. doi: 10.1109/TPAMI. 2019.2926463. URLhttps://doi.ieeecomputersociety.org/10.1109/TPAMI.201...

work page doi:10.1109/tpami 2020
[14]

One million scenes for autonomous driving: Once dataset

Jiageng Mao, Niu Minzhe, ChenHan Jiang, hanxue liang, Jingheng Chen, Xiaodan Liang, Yamin Li, Chaoqiang Ye, Wei Zhang, Zhenguo Li, Jie Yu, Chunjing XU, and Hang Xu. One million scenes for autonomous driving: Once dataset. In J. Vanschoren and S. Yeung, editors,Proceed- ings of the Neural Information Processing Systems Track on Datasets and Benchmarks, volume 1,
[15]

URL https://datasets-benchmarks-proceedings.neurips.cc/paper_files/paper/ 2021/file/67c6a1e7ce56d3d6fa748ab6d9af3fd7-Paper-round1.pdf

2021
[16]

KITTI-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d.Pattern Analysis and Machine Intelligence (PAMI), 2022

Yiyi Liao, Jun Xie, and Andreas Geiger. KITTI-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d.Pattern Analysis and Machine Intelligence (PAMI), 2022

2022
[17]

Openlane-v2: A topology reasoning benchmark for unified 3d hd mapping

Huijie Wang, Tianyu Li, Yang Li, Li Chen, Chonghao Sima, Zhenbo Liu, Bangjun Wang, Peijin Jia, Yuting Wang, Shengyin Jiang, et al. Openlane-v2: A topology reasoning benchmark for unified 3d hd mapping. InThirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023

2023
[18]

Hdmapnet: An online hd map construction and evaluation framework

Qi Li, Yue Wang, Yilun Wang, and Hang Zhao. Hdmapnet: An online hd map construction and evaluation framework. In2022 International Conference on Robotics and Automation (ICRA), pages 4628–4634,
[19]

doi: 10.1109/ICRA46639.2022.9812383

work page doi:10.1109/icra46639.2022.9812383 2022
[20]

VectorMapNet: End-to-end vectorized HD map learning

Yicheng Liu, Tianyuan Yuan, Yue Wang, Yilun Wang, and Hang Zhao. VectorMapNet: End-to-end vectorized HD map learning. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors,Proceedings of the 40th International Conference on Machine Learning, volume 202 ofProceedings of Machine Learning Research,...

2023
[21]

Maptr: Structured modeling and learning for online vectorized hd map construction.arXiv preprint arXiv:2208.14437, 2022

Bencheng Liao, Shaoyu Chen, Xinggang Wang, Tianheng Cheng, Qian Zhang, Wenyu Liu, and Chang Huang. Maptr: Structured modeling and learning for online vectorized hd map construction.arXiv preprint arXiv:2208.14437, 2022

arXiv 2022
[22]

Maptrv2: An end-to-end framework for online vectorized hd map construction.Interna- tional Journal of Computer Vision, Oct 2024

Bencheng Liao, Shaoyu Chen, Yunchi Zhang, Bo Jiang, Qian Zhang, Wenyu Liu, Chang Huang, and Xinggang Wang. Maptrv2: An end-to-end framework for online vectorized hd map construction.Interna- tional Journal of Computer Vision, Oct 2024. ISSN 1573-1405. doi: 10.1007/s11263-024-02235-z. URL https://doi.org/10.1007/s11263-024-02235-z

work page doi:10.1007/s11263-024-02235-z 2024
[23]

Streammapnet: Streaming mapping network for vectorized online hd map construction

Tianyuan Yuan, Yicheng Liu, Yue Wang, Yilun Wang, and Hang Zhao. Streammapnet: Streaming mapping network for vectorized online hd map construction. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 7356–7365, 2024

2024
[24]

End-to-end vectorized hd-map construction with piecewise bezier curve

Limeng Qiao, Wenjie Ding, Xi Qiu, and Chi Zhang. End-to-end vectorized hd-map construction with piecewise bezier curve. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13218–13228, June 2023

2023
[25]

Pivotnet: Vectorized pivot learning for end-to-end hd map construction

Wenjie Ding, Limeng Qiao, Xi Qiu, and Chi Zhang. Pivotnet: Vectorized pivot learning for end-to-end hd map construction. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 3672–3682, 2023

2023
[26]

Stream query denoising for vectorized hd-map construction

Shuo Wang, Fan Jia, Weixin Mao, Yingfei Liu, Yucheng Zhao, Zehui Chen, Tiancai Wang, Chi Zhang, Xiangyu Zhang, and Feng Zhao. Stream query denoising for vectorized hd-map construction. InEuropean Conference on Computer Vision, pages 203–220. Springer, 2024

2024
[27]

Maptracker: Tracking with strided memory fusion for consistent vector hd mapping

Jiacheng Chen, Yuefan Wu, Jiaqi Tan, Hang Ma, and Yasutaka Furukawa. Maptracker: Tracking with strided memory fusion for consistent vector hd mapping. InEuropean Conference on Computer Vision, pages 90–107. Springer, 2024

2024
[28]

Enhancing vectorized map perception with historical rasterized maps

Xiaoyu Zhang, Guangwei Liu, Zihao Liu, Ningyi Xu, Yunhui Liu, and Ji Zhao. Enhancing vectorized map perception with historical rasterized maps. InEuropean Conference on Computer Vision, pages 422–439. Springer, 2024

2024
[29]

Globalmapnet: An online framework for vectorized global hd map construction.arXiv preprint arXiv:2409.10063, 2024

Anqi Shi, Yuze Cai, Xiangyu Chen, Jian Pu, Zeyu Fu, and Hong Lu. Globalmapnet: An online framework for vectorized global hd map construction.arXiv preprint arXiv:2409.10063, 2024

arXiv 2024
[30]

Mapexpert: Online hd map construction with simple and efficient sparse map element expert

Dapeng Zhang, Dayu Chen, Peng Zhi, Yinda Chen, Zhenlong Yuan, Chenyang Li, Rui Zhou, Qingguo Zhou, et al. Mapexpert: Online hd map construction with simple and efficient sparse map element expert. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 14745–14753, 2025. 11

2025
[31]

Histrackmap: Global vectorized high-definition map construction via history map tracking.arXiv preprint arXiv:2503.07168, 2025

Jing Yang, Sen Yang, Xiao Tan, and Hanli Wang. Histrackmap: Global vectorized high-definition map construction via history map tracking.arXiv preprint arXiv:2503.07168, 2025

arXiv 2025
[32]

Mapping like a skeptic: Probabilistic bev projection for online hd mapping.arXiv preprint arXiv:2508.21689, 2025

Fatih Erdo˘gan, Merve Rabia Barın, and Fatma Güney. Mapping like a skeptic: Probabilistic bev projection for online hd mapping.arXiv preprint arXiv:2508.21689, 2025

arXiv 2025
[33]

Generation of training data from hd maps in the lanelet2 framework.arXiv preprint arXiv:2407.17409, 2024

Fabian Immel, Richard Fehler, Frank Bieder, and Christoph Stiller. Generation of training data from hd maps in the lanelet2 framework.arXiv preprint arXiv:2407.17409, 2024. URL https://arxiv.org/ abs/2407.17409

arXiv 2024
[34]

3d packing for self- supervised monocular depth estimation

Vitor Guizilini, Rares Ambrus, Sudeep Pillai, Allan Raventos, and Adrien Gaidon. 3d packing for self- supervised monocular depth estimation. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020

2020
[35]

Chen, Zhenyu Li, Guang Shi, Jiashi Feng, and Bingyi Kang

Haotong Lin, Sili Chen, Jun Hao Liew, Donny Y . Chen, Zhenyu Li, Guang Shi, Jiashi Feng, and Bingyi Kang. Depth anything 3: Recovering the visual space from any views.arXiv preprint arXiv:2511.10647, 2025

Pith/arXiv arXiv 2025
[36]

Unidac: Universal metric depth estimation for any camera, 2026

Girish Chandar Ganesan, Yuliang Guo, Liu Ren, and Xiaoming Liu. Unidac: Universal metric depth estimation for any camera, 2026. URLhttps://arxiv.org/abs/2603.27105

Pith/arXiv arXiv 2026
[37]

Mars: An instance-aware, modular and realistic simulator for autonomous driving

Zirui Wu, Tianyu Liu, Liyi Luo, Zhide Zhong, Jianteng Chen, Hongmin Xiao, Chao Hou, Haozhe Lou, Yuantao Chen, Runyi Yang, Yuxin Huang, Xiaoyu Ye, Zike Yan, Yongliang Shi, Yiyi Liao, and Hao Zhao. Mars: An instance-aware, modular and realistic simulator for autonomous driving. In Lu Fang, Jian Pei, Guangtao Zhai, and Ruiping Wang, editors,Artificial Intell...

2024
[38]

EmerneRF: Emergent spatial-temporal scene decomposition via self-supervision

Jiawei Yang, Boris Ivanovic, Or Litany, Xinshuo Weng, Seung Wook Kim, Boyi Li, Tong Che, Danfei Xu, Sanja Fidler, Marco Pavone, and Yue Wang. EmerneRF: Emergent spatial-temporal scene decomposition via self-supervision. InThe Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=ycv2z8TYur

2024
[39]

Street gaussians: Modeling dynamic urban scenes with gaussian splatting

Yunzhi Yan, Haotong Lin, Chenxu Zhou, Weijie Wang, Haiyang Sun, Kun Zhan, Xianpeng Lang, Xiaowei Zhou, and Sida Peng. Street gaussians: Modeling dynamic urban scenes with gaussian splatting. In European Conference on Computer Vision, pages 156–173. Springer, 2024

2024
[40]

Omnire: Omni urban scene reconstruction

Ziyu Chen, Jiawei Yang, Jiahui Huang, Riccardo de Lutio, Janick Martinez Esturo, Boris Ivanovic, Or Litany, Zan Gojcic, Sanja Fidler, Marco Pavone, Li Song, and Yue Wang. Omnire: Omni urban scene reconstruction. InThe Thirteenth International Conference on Learning Representations, 2025

2025
[41]

Recondrive: Fast feed-forward 4d gaussian splatting for autonomous driving scene reconstruction

Haibao Yu, Kuntao Xiao, Jiahang Wang, Ruiyang Hao, Guoran Hu, Yuxin Huang, Haifang Qin, Bowen Jing, Yuntian Bo, and Ping Luo. Recondrive: Fast feed-forward 4d gaussian splatting for autonomous driving scene reconstruction. Inhttps://arxiv.org/abs/2603.07552, 2026

arXiv 2026
[42]

Planning-oriented autonomous driving

Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, et al. Planning-oriented autonomous driving. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 17853–17862, 2023

2023
[43]

Vad: Vectorized scene representation for efficient autonomous driving

Bo Jiang, Shaoyu Chen, Qing Xu, Bencheng Liao, Jiajie Chen, Helong Zhou, Qian Zhang, Wenyu Liu, Chang Huang, and Xinggang Wang. Vad: Vectorized scene representation for efficient autonomous driving. ICCV, 2023

2023
[44]

Introducing Jpegli: A new JPEG coding library

Zoltan Szabadka, Martin Bruse, and Jyrki Alakuijala. Introducing Jpegli: A new JPEG coding library. Google Open Source Blog, April 2024. URL https://opensource.googleblog.com/2024/04/ introducing-jpegli-new-jpeg-coding-library.html. Accessed: 2026-05-01

2024
[45]

Tan et al

K. Tan et al. H. Caesar, J. Kabzan. Nuplan: A closed-loop ml-based planning benchmark for autonomous vehicles. InCVPR ADP3 workshop, 2021

2021
[46]

Trust, but verify: Cross-modality fusion for hd map change detection

John Lambert and James Hays. Trust, but verify: Cross-modality fusion for hd map change detection. In J. Vanschoren and S. Yeung, editors,Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, volume 1. Curran, 2021

2021
[47]

Gtsign-220: A crowd-sourced, stvo-aligned benchmark for fine-grained german traffic sign recognition

Miriam Louise Carnot, Erik Fastermann, Jonas Kunze, Eric Peukert, André Ludwig, and Bogdan Franczyk. Gtsign-220: A crowd-sourced, stvo-aligned benchmark for fine-grained german traffic sign recognition. In Intelligent Vehicles Symposium (IV), 2026. 12

2026
[48]

Contact-GraspNet: Efficient 6-dof grasp generation in cluttered scenes

Jan-Hendrik Pauls, Benjamin Schmidt, and Christoph Stiller. Automatic mapping of tailored landmark representations for automated driving and map learning. In2021 IEEE International Conference on Robotics and Automation (ICRA), pages 6725–6731, 2021. doi: 10.1109/ICRA48506.2021.9561432

work page doi:10.1109/icra48506.2021.9561432 2021
[49]

Leveraging enhanced queries of point sets for vectorized map construction

Zihao Liu, Xiaoyu Zhang, Guangwei Liu, Ji Zhao, and Ningyi Xu. Leveraging enhanced queries of point sets for vectorized map construction. InEuropean Conference on Computer Vision, 2024

2024
[50]

SDTagnet: Leveraging text-annotated navigation maps for online HD map construction

Fabian Immel, Jan-Hendrik Pauls, Richard Fehler, Frank Bieder, Jonas Merkert, and Christoph Stiller. SDTagnet: Leveraging text-annotated navigation maps for online HD map construction. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. URL https://openreview.net/ forum?id=N3E1cU8Cv3

2025
[51]

Ma- pAnything: Universal feed-forward metric 3D reconstruction

Nikhil Keetha, Norman Müller, Johannes Schönberger, Lorenzo Porzi, Yuchen Zhang, Tobias Fischer, Arno Knapitsch, Duncan Zauss, Ethan Weber, Nelson Antunes, Jonathon Luiten, Manuel Lopez-Antequera, Samuel Rota Bulò, Christian Richardt, Deva Ramanan, Sebastian Scherer, and Peter Kontschieder. Ma- pAnything: Universal feed-forward metric 3D reconstruction. I...

2026
[52]

Black, and Otmar Hilliges

Ze Yang, Yun Chen, Jingkang Wang, Sivabalan Manivasagam, Wei-Chiu Ma, Anqi Joyce Yang, and Raquel Urtasun. UniSim: A Neural Closed-Loop Sensor Simulator . In2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1389–1399, Los Alamitos, CA, USA, June 2023. IEEE Computer Society. doi: 10.1109/CVPR52729.2023.00140. URL https://doi...

work page doi:10.1109/cvpr52729.2023.00140 2023
[53]

Recondreamer: Crafting world models for driving scene reconstruction via online restoration

Chaojun Ni, Guosheng Zhao, Xiaofeng Wang, Zheng Zhu, Wenkang Qin, Guan Huang, Chen Liu, Yuyin Chen, Yida Wang, Xueyang Zhang, et al. Recondreamer: Crafting world models for driving scene reconstruction via online restoration. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 1559–1569, 2025

2025
[54]

Social lstm: Human trajectory prediction in crowded spaces

Alexandre Alahi, Kratarth Goel, Vignesh Ramanathan, Alexandre Robicquet, Li Fei-Fei, and Silvio Savarese. Social lstm: Human trajectory prediction in crowded spaces. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016

2016
[55]

Royden Wagner, Omer Sahin Tas, Jaime Villa, Felix Hauser, Yinzhe Shen, Marlon Steiner, Dominik Strutz, Carlos Fernandez, Christian Kinzig, Guillermo S. Guitierrez-Cabello, Hendrik Königshof, Fabian Immel, Richard Schwarzkopf, Nils Alexander Rack, Kevin Rösch, Kaiwen Wang, Jan-Hendrik Pauls, Martin Lauer, Igor Gilitschenski, Holger Caesar, and Christoph St...

Pith/arXiv arXiv 2026
[56]

Divide and merge: Motion and semantic learning in end-to-end autonomous driving.Transactions on Machine Learning Research, 2025(11), 2025

Yinzhe Shen, Omer ¸ Sahin Tas, Kaiwen Wang, Royden Wagner, and Christoph Stiller. Divide and merge: Motion and semantic learning in end-to-end autonomous driving.Transactions on Machine Learning Research, 2025(11), 2025

2025
[57]

Navigation-guided sparse scene representation for end-to-end autonomous driving

Peidong Li and Dixiao Cui. Navigation-guided sparse scene representation for end-to-end autonomous driving. InInternational Conference on Learning Representations (ICLR), 2025

2025
[58]

Epona: Autoregressive diffusion world model for autonomous driving

Kaiwen Zhang, Zhenyu Tang, Xiaotao Hu, Xingang Pan, Xiaoyang Guo, Yuan Liu, Jingwei Huang, Li Yuan, Qian Zhang, Xiao-Xiao Long, Xun Cao, and Wei Yin. Epona: Autoregressive diffusion world model for autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025

2025
[59]

RawTherapee: A powerful cross-platform raw photo processing program

RawTherapee Development Team. RawTherapee: A powerful cross-platform raw photo processing program. URL https://github.com/RawTherapee/RawTherapee. Includes the AMaZE demosaicing algorithm and raw-domain chromatic aberration correction by E. J. Martinec
[60]

Users prefer jpegli over same-sized libjpeg-turbo or mozjpeg, 2024

Martin Bruse, Luca Versari, Zoltan Szabadka, and Jyrki Alakuijala. Users prefer jpegli over same-sized libjpeg-turbo or mozjpeg, 2024. URLhttps://arxiv.org/abs/2403.18589

arXiv 2024
[61]

Face off: Privacy v progress — how deep natural anonymization pro- tects privacy in the age of machine learning

brighter AI Technologies. Face off: Privacy v progress — how deep natural anonymization pro- tects privacy in the age of machine learning. White paper, brighter AI Technologies GmbH, Berlin, Germany, 2022. URL https://ac-landing-pages-user-uploads-production.s3.amazonaws. com/0000122471/803bb7a7-de73-4596-9548-6d1ca3a80e32.pdf

arXiv 2022
[62]

3dref: 3d dataset and benchmark for reflection detection in rgb and lidar data

Xiting Zhao and Sören Schwertfeger. 3dref: 3d dataset and benchmark for reflection detection in rgb and lidar data. In2024 International Conference on 3D Vision (3DV), pages 225–234, 2024. doi: 10.1109/3DV62453.2024.00009. 13

work page doi:10.1109/3dv62453.2024.00009 2024
[63]

Kiss-slam: A simple, robust, and accurate 3d lidar slam system with enhanced generalization capabilities

Tiziano Guadagnino, Benedikt Mersch, Saurabh Gupta, Ignacio Vizzo, Giorgio Grisetti, and Cyrill Stachniss. Kiss-slam: A simple, robust, and accurate 3d lidar slam system with enhanced generalization capabilities. In2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5363–5370. IEEE, 2025

2025
[64]

Calibrating multiple cameras with non-overlapping views using coded checkerboard targets

Tobias Strauß, Julius Ziegler, and Johannes Beck. Calibrating multiple cameras with non-overlapping views using coded checkerboard targets. In17th International IEEE Conference on Intelligent Transportation Systems (ITSC), pages 2623–2628, 2014. doi: 10.1109/ITSC.2014.6958110

work page doi:10.1109/itsc.2014.6958110 2014
[65]

Generalized b-spline camera model

Johannes Beck and Christoph Stiller. Generalized b-spline camera model. In2018 IEEE Intelligent Vehicles Symposium (IV), pages 2137–2142, 2018. doi: 10.1109/IVS.2018.8500466

work page doi:10.1109/ivs.2018.8500466 2018
[66]

SAM 3: Segment anything with concepts

Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Didac Suris Coll- Vinent, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, Jie Lei, Tengyu Ma, Baishan Guo, Arpit Kalla, Markus Marks, Joseph Greer, Meng Wang, Peize Sun, Roman Rädle, Triantafyllos Afouras, Effrosyni Mavroudi, Katherine Xu, Tsung-Han Wu, Yu Z...

2026
[67]

Java OpenStreetMap Editor

JOSM. Java OpenStreetMap Editor. https://josm.openstreetmap.de/, 2026. Accessed: 01.05.2026

2026
[68]

Mapillary

Mapillary. Mapillary. https://www.mapillary.com/app, 2026. Street-level imagery platform. Ac- cessed: 2026-05-04

2026
[69]

https://doi.org/10.1126/scirobotics.abm6074 Chris Newcombe, Tim Rath, Fan Zhang, Bogdan Munteanu, Marc Brooker, and Michael Deardeuff

Steven Macenski, Tully Foote, Brian Gerkey, Chris Lalancette, and William Woodall. Robot operating system 2: Design, architecture, and uses in the wild.Science Robotics, 7(66):eabm6074, 2022. doi: 10.1126/scirobotics.abm6074. URL https://www.science.org/doi/abs/10.1126/scirobotics. abm6074

work page doi:10.1126/scirobotics.abm6074 2022
[70]

Localization is all you evaluate: Data leakage in online mapping datasets and how to fix it

Adam Lilja, Junsheng Fu, Erik Stenborg, and Lars Hammarstrand. Localization is all you evaluate: Data leakage in online mapping datasets and how to fix it. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22150–22159, 2024

2024
[71]

Scaling open-vocabulary object detection

Matthias Minderer, Alexey Gritsenko, and Neil Houlsby. Scaling open-vocabulary object detection. Advances in Neural Information Processing Systems, 36:72983–73007, 2023. 14 A Details on the Sensor Setup Tables 7 to 10 describe our sensor setup in detail, with a real-world picture of it shown in Figure 10. Table 7:Camera setup.All cameras are manufactured ...

arXiv 2023

[1] [1]

Vision meets robotics: The kitti dataset.The International Journal of Robotics Research, 32(11):1231–1237, 2013

Andreas Geiger, Philip Lenz, Raquel Urtasun, and Christoph Stiller. Vision meets robotics: The kitti dataset.The International Journal of Robotics Research, 32(11):1231–1237, 2013. doi: 10.1177/ 0278364913491297

2013

[2] [2]

Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom

Holger Caesar, Varun Bankiti, Alex H. Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. nuscenes: A multimodal dataset for autonomous driving. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020

2020

[3] [3]

Scalability in perception for autonomous driving: Waymo open dataset

Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, Vijay Vasudevan, Wei Han, Jiquan Ngiam, Hang Zhao, Aleksei Timofeev, Scott Ettinger, Maxim Krivokon, Amy Gao, Aditya Joshi, Yu Zhang, Jonathon Shlens, Zhifeng Chen, and Dragomir Anguelov. Scalability in perception...

2020

[4] [4]

Qi, Yin Zhou, Zoey Yang, Aurélien Chouard, Pei Sun, Jiquan Ngiam, Vijay Vasudevan, Alexander McCauley, Jonathon Shlens, and Dragomir Anguelov

Scott Ettinger, Shuyang Cheng, Benjamin Caine, Chenxi Liu, Hang Zhao, Sabeek Pradhan, Yuning Chai, Ben Sapp, Charles R. Qi, Yin Zhou, Zoey Yang, Aurélien Chouard, Pei Sun, Jiquan Ngiam, Vijay Vasudevan, Alexander McCauley, Jonathon Shlens, and Dragomir Anguelov. Large scale interactive motion forecasting for autonomous driving: The waymo open motion datas...

2021

[5] [5]

Man truckscenes: A multimodal dataset for autonomous trucking in diverse conditions

Felix Fent, Fabian Kuttenreich, Florian Ruch, Farija Rizwin, Stefan Juergens, Lorenz Lecher- mann, Christian Nissler, Andrea Perl, Ulrich V oll, Min Yan, and Markus Lienkamp. Man truckscenes: A multimodal dataset for autonomous trucking in diverse conditions. In A. Glober- son, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,A...

2024

[6] [6]

Truck- drive: Long-range autonomous highway driving dataset, 2026

Filippo Ghilotti, Edoardo Palladin, Samuel Brucker, Adam Sigal, Mario Bijelic, and Felix Heide. Truck- drive: Long-range autonomous highway driving dataset, 2026. URL https://arxiv.org/abs/2603. 02413

2026

[7] [7]

Zenseact open dataset: A large-scale and diverse multimodal dataset for autonomous driving

Mina Alibeigi, William Ljungbergh, Adam Tonderski, Georg Hess, Adam Lilja, Carl Lindstrom, Daria Motorniuk, Junsheng Fu, Jenny Widahl, and Christoffer Petersson. Zenseact open dataset: A large-scale and diverse multimodal dataset for autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2023

[8] [8]

PhysicalAI-Autonomous-Vehicles

NVIDIA Corporation. PhysicalAI-Autonomous-Vehicles. https://huggingface.co/datasets/ nvidia/PhysicalAI-Autonomous-Vehicles, oct 2025. Accessed 2026-05-06, released 2025-10-28

2025

[9] [9]

Lanelet2: A high-definition map framework for the future of automated driving

Fabian Poggenhans, Jan-Hendrik Pauls, Johannes Janosovits, Stefan Orf, Maximilian Naumann, Florian Kuhnt, and Matthias Mayr. Lanelet2: A high-definition map framework for the future of automated driving. In2018 21st International Conference on Intelligent Transportation Systems (ITSC), pages 1672–1679,

[10] [10]

doi: 10.1109/ITSC.2018.8569929

work page doi:10.1109/itsc.2018.8569929 2018

[11] [11]

Autoware

Autoware Foundation. Autoware. https://github.com/autowarefoundation/autoware. Accessed: 2026-05-02

2026

[12] [12]

Argoverse 2: Next generation datasets for self-driving perception and forecasting

Benjamin Wilson, William Qi, Tanmay Agarwal, John Lambert, Jagjeet Singh, Siddhesh Khandelwal, Bowen Pan, Ratnesh Kumar, Andrew Hartnett, Jhony Kaesemodel Pontes, Deva Ramanan, Peter Carr, and James Hays. Argoverse 2: Next generation datasets for self-driving perception and forecasting. In Proceedings of the Neural Information Processing Systems Track on ...

2021

[13] [13]

Evangelidis and Emmanouil Z

Xinyu Huang, Peng Wang, Xinjing Cheng, Dingfu Zhou, Qichuan Geng, and Ruigang Yang. The ApolloScape Open Dataset for Autonomous Driving and Its Application .IEEE Transactions on Pattern Analysis & Machine Intelligence, 42(10):2702–2719, October 2020. ISSN 1939-3539. doi: 10.1109/TPAMI. 2019.2926463. URLhttps://doi.ieeecomputersociety.org/10.1109/TPAMI.201...

work page doi:10.1109/tpami 2020

[14] [14]

One million scenes for autonomous driving: Once dataset

Jiageng Mao, Niu Minzhe, ChenHan Jiang, hanxue liang, Jingheng Chen, Xiaodan Liang, Yamin Li, Chaoqiang Ye, Wei Zhang, Zhenguo Li, Jie Yu, Chunjing XU, and Hang Xu. One million scenes for autonomous driving: Once dataset. In J. Vanschoren and S. Yeung, editors,Proceed- ings of the Neural Information Processing Systems Track on Datasets and Benchmarks, volume 1,

[15] [15]

URL https://datasets-benchmarks-proceedings.neurips.cc/paper_files/paper/ 2021/file/67c6a1e7ce56d3d6fa748ab6d9af3fd7-Paper-round1.pdf

2021

[16] [16]

KITTI-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d.Pattern Analysis and Machine Intelligence (PAMI), 2022

Yiyi Liao, Jun Xie, and Andreas Geiger. KITTI-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d.Pattern Analysis and Machine Intelligence (PAMI), 2022

2022

[17] [17]

Openlane-v2: A topology reasoning benchmark for unified 3d hd mapping

Huijie Wang, Tianyu Li, Yang Li, Li Chen, Chonghao Sima, Zhenbo Liu, Bangjun Wang, Peijin Jia, Yuting Wang, Shengyin Jiang, et al. Openlane-v2: A topology reasoning benchmark for unified 3d hd mapping. InThirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023

2023

[18] [18]

Hdmapnet: An online hd map construction and evaluation framework

Qi Li, Yue Wang, Yilun Wang, and Hang Zhao. Hdmapnet: An online hd map construction and evaluation framework. In2022 International Conference on Robotics and Automation (ICRA), pages 4628–4634,

[19] [19]

doi: 10.1109/ICRA46639.2022.9812383

work page doi:10.1109/icra46639.2022.9812383 2022

[20] [20]

VectorMapNet: End-to-end vectorized HD map learning

Yicheng Liu, Tianyuan Yuan, Yue Wang, Yilun Wang, and Hang Zhao. VectorMapNet: End-to-end vectorized HD map learning. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors,Proceedings of the 40th International Conference on Machine Learning, volume 202 ofProceedings of Machine Learning Research,...

2023

[21] [21]

Maptr: Structured modeling and learning for online vectorized hd map construction.arXiv preprint arXiv:2208.14437, 2022

Bencheng Liao, Shaoyu Chen, Xinggang Wang, Tianheng Cheng, Qian Zhang, Wenyu Liu, and Chang Huang. Maptr: Structured modeling and learning for online vectorized hd map construction.arXiv preprint arXiv:2208.14437, 2022

arXiv 2022

[22] [22]

Maptrv2: An end-to-end framework for online vectorized hd map construction.Interna- tional Journal of Computer Vision, Oct 2024

Bencheng Liao, Shaoyu Chen, Yunchi Zhang, Bo Jiang, Qian Zhang, Wenyu Liu, Chang Huang, and Xinggang Wang. Maptrv2: An end-to-end framework for online vectorized hd map construction.Interna- tional Journal of Computer Vision, Oct 2024. ISSN 1573-1405. doi: 10.1007/s11263-024-02235-z. URL https://doi.org/10.1007/s11263-024-02235-z

work page doi:10.1007/s11263-024-02235-z 2024

[23] [23]

Streammapnet: Streaming mapping network for vectorized online hd map construction

Tianyuan Yuan, Yicheng Liu, Yue Wang, Yilun Wang, and Hang Zhao. Streammapnet: Streaming mapping network for vectorized online hd map construction. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 7356–7365, 2024

2024

[24] [24]

End-to-end vectorized hd-map construction with piecewise bezier curve

Limeng Qiao, Wenjie Ding, Xi Qiu, and Chi Zhang. End-to-end vectorized hd-map construction with piecewise bezier curve. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13218–13228, June 2023

2023

[25] [25]

Pivotnet: Vectorized pivot learning for end-to-end hd map construction

Wenjie Ding, Limeng Qiao, Xi Qiu, and Chi Zhang. Pivotnet: Vectorized pivot learning for end-to-end hd map construction. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 3672–3682, 2023

2023

[26] [26]

Stream query denoising for vectorized hd-map construction

Shuo Wang, Fan Jia, Weixin Mao, Yingfei Liu, Yucheng Zhao, Zehui Chen, Tiancai Wang, Chi Zhang, Xiangyu Zhang, and Feng Zhao. Stream query denoising for vectorized hd-map construction. InEuropean Conference on Computer Vision, pages 203–220. Springer, 2024

2024

[27] [27]

Maptracker: Tracking with strided memory fusion for consistent vector hd mapping

Jiacheng Chen, Yuefan Wu, Jiaqi Tan, Hang Ma, and Yasutaka Furukawa. Maptracker: Tracking with strided memory fusion for consistent vector hd mapping. InEuropean Conference on Computer Vision, pages 90–107. Springer, 2024

2024

[28] [28]

Enhancing vectorized map perception with historical rasterized maps

Xiaoyu Zhang, Guangwei Liu, Zihao Liu, Ningyi Xu, Yunhui Liu, and Ji Zhao. Enhancing vectorized map perception with historical rasterized maps. InEuropean Conference on Computer Vision, pages 422–439. Springer, 2024

2024

[29] [29]

Globalmapnet: An online framework for vectorized global hd map construction.arXiv preprint arXiv:2409.10063, 2024

Anqi Shi, Yuze Cai, Xiangyu Chen, Jian Pu, Zeyu Fu, and Hong Lu. Globalmapnet: An online framework for vectorized global hd map construction.arXiv preprint arXiv:2409.10063, 2024

arXiv 2024

[30] [30]

Mapexpert: Online hd map construction with simple and efficient sparse map element expert

Dapeng Zhang, Dayu Chen, Peng Zhi, Yinda Chen, Zhenlong Yuan, Chenyang Li, Rui Zhou, Qingguo Zhou, et al. Mapexpert: Online hd map construction with simple and efficient sparse map element expert. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 14745–14753, 2025. 11

2025

[31] [31]

Histrackmap: Global vectorized high-definition map construction via history map tracking.arXiv preprint arXiv:2503.07168, 2025

Jing Yang, Sen Yang, Xiao Tan, and Hanli Wang. Histrackmap: Global vectorized high-definition map construction via history map tracking.arXiv preprint arXiv:2503.07168, 2025

arXiv 2025

[32] [32]

Mapping like a skeptic: Probabilistic bev projection for online hd mapping.arXiv preprint arXiv:2508.21689, 2025

Fatih Erdo˘gan, Merve Rabia Barın, and Fatma Güney. Mapping like a skeptic: Probabilistic bev projection for online hd mapping.arXiv preprint arXiv:2508.21689, 2025

arXiv 2025

[33] [33]

Generation of training data from hd maps in the lanelet2 framework.arXiv preprint arXiv:2407.17409, 2024

Fabian Immel, Richard Fehler, Frank Bieder, and Christoph Stiller. Generation of training data from hd maps in the lanelet2 framework.arXiv preprint arXiv:2407.17409, 2024. URL https://arxiv.org/ abs/2407.17409

arXiv 2024

[34] [34]

3d packing for self- supervised monocular depth estimation

Vitor Guizilini, Rares Ambrus, Sudeep Pillai, Allan Raventos, and Adrien Gaidon. 3d packing for self- supervised monocular depth estimation. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020

2020

[35] [35]

Chen, Zhenyu Li, Guang Shi, Jiashi Feng, and Bingyi Kang

Haotong Lin, Sili Chen, Jun Hao Liew, Donny Y . Chen, Zhenyu Li, Guang Shi, Jiashi Feng, and Bingyi Kang. Depth anything 3: Recovering the visual space from any views.arXiv preprint arXiv:2511.10647, 2025

Pith/arXiv arXiv 2025

[36] [36]

Unidac: Universal metric depth estimation for any camera, 2026

Girish Chandar Ganesan, Yuliang Guo, Liu Ren, and Xiaoming Liu. Unidac: Universal metric depth estimation for any camera, 2026. URLhttps://arxiv.org/abs/2603.27105

Pith/arXiv arXiv 2026

[37] [37]

Mars: An instance-aware, modular and realistic simulator for autonomous driving

Zirui Wu, Tianyu Liu, Liyi Luo, Zhide Zhong, Jianteng Chen, Hongmin Xiao, Chao Hou, Haozhe Lou, Yuantao Chen, Runyi Yang, Yuxin Huang, Xiaoyu Ye, Zike Yan, Yongliang Shi, Yiyi Liao, and Hao Zhao. Mars: An instance-aware, modular and realistic simulator for autonomous driving. In Lu Fang, Jian Pei, Guangtao Zhai, and Ruiping Wang, editors,Artificial Intell...

2024

[38] [38]

EmerneRF: Emergent spatial-temporal scene decomposition via self-supervision

Jiawei Yang, Boris Ivanovic, Or Litany, Xinshuo Weng, Seung Wook Kim, Boyi Li, Tong Che, Danfei Xu, Sanja Fidler, Marco Pavone, and Yue Wang. EmerneRF: Emergent spatial-temporal scene decomposition via self-supervision. InThe Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=ycv2z8TYur

2024

[39] [39]

Street gaussians: Modeling dynamic urban scenes with gaussian splatting

Yunzhi Yan, Haotong Lin, Chenxu Zhou, Weijie Wang, Haiyang Sun, Kun Zhan, Xianpeng Lang, Xiaowei Zhou, and Sida Peng. Street gaussians: Modeling dynamic urban scenes with gaussian splatting. In European Conference on Computer Vision, pages 156–173. Springer, 2024

2024

[40] [40]

Omnire: Omni urban scene reconstruction

Ziyu Chen, Jiawei Yang, Jiahui Huang, Riccardo de Lutio, Janick Martinez Esturo, Boris Ivanovic, Or Litany, Zan Gojcic, Sanja Fidler, Marco Pavone, Li Song, and Yue Wang. Omnire: Omni urban scene reconstruction. InThe Thirteenth International Conference on Learning Representations, 2025

2025

[41] [41]

Recondrive: Fast feed-forward 4d gaussian splatting for autonomous driving scene reconstruction

Haibao Yu, Kuntao Xiao, Jiahang Wang, Ruiyang Hao, Guoran Hu, Yuxin Huang, Haifang Qin, Bowen Jing, Yuntian Bo, and Ping Luo. Recondrive: Fast feed-forward 4d gaussian splatting for autonomous driving scene reconstruction. Inhttps://arxiv.org/abs/2603.07552, 2026

arXiv 2026

[42] [42]

Planning-oriented autonomous driving

Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, et al. Planning-oriented autonomous driving. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 17853–17862, 2023

2023

[43] [43]

Vad: Vectorized scene representation for efficient autonomous driving

Bo Jiang, Shaoyu Chen, Qing Xu, Bencheng Liao, Jiajie Chen, Helong Zhou, Qian Zhang, Wenyu Liu, Chang Huang, and Xinggang Wang. Vad: Vectorized scene representation for efficient autonomous driving. ICCV, 2023

2023

[44] [44]

Introducing Jpegli: A new JPEG coding library

Zoltan Szabadka, Martin Bruse, and Jyrki Alakuijala. Introducing Jpegli: A new JPEG coding library. Google Open Source Blog, April 2024. URL https://opensource.googleblog.com/2024/04/ introducing-jpegli-new-jpeg-coding-library.html. Accessed: 2026-05-01

2024

[45] [45]

Tan et al

K. Tan et al. H. Caesar, J. Kabzan. Nuplan: A closed-loop ml-based planning benchmark for autonomous vehicles. InCVPR ADP3 workshop, 2021

2021

[46] [46]

Trust, but verify: Cross-modality fusion for hd map change detection

John Lambert and James Hays. Trust, but verify: Cross-modality fusion for hd map change detection. In J. Vanschoren and S. Yeung, editors,Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, volume 1. Curran, 2021

2021

[47] [47]

Gtsign-220: A crowd-sourced, stvo-aligned benchmark for fine-grained german traffic sign recognition

Miriam Louise Carnot, Erik Fastermann, Jonas Kunze, Eric Peukert, André Ludwig, and Bogdan Franczyk. Gtsign-220: A crowd-sourced, stvo-aligned benchmark for fine-grained german traffic sign recognition. In Intelligent Vehicles Symposium (IV), 2026. 12

2026

[48] [48]

Contact-GraspNet: Efficient 6-dof grasp generation in cluttered scenes

Jan-Hendrik Pauls, Benjamin Schmidt, and Christoph Stiller. Automatic mapping of tailored landmark representations for automated driving and map learning. In2021 IEEE International Conference on Robotics and Automation (ICRA), pages 6725–6731, 2021. doi: 10.1109/ICRA48506.2021.9561432

work page doi:10.1109/icra48506.2021.9561432 2021

[49] [49]

Leveraging enhanced queries of point sets for vectorized map construction

Zihao Liu, Xiaoyu Zhang, Guangwei Liu, Ji Zhao, and Ningyi Xu. Leveraging enhanced queries of point sets for vectorized map construction. InEuropean Conference on Computer Vision, 2024

2024

[50] [50]

SDTagnet: Leveraging text-annotated navigation maps for online HD map construction

Fabian Immel, Jan-Hendrik Pauls, Richard Fehler, Frank Bieder, Jonas Merkert, and Christoph Stiller. SDTagnet: Leveraging text-annotated navigation maps for online HD map construction. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. URL https://openreview.net/ forum?id=N3E1cU8Cv3

2025

[51] [51]

Ma- pAnything: Universal feed-forward metric 3D reconstruction

Nikhil Keetha, Norman Müller, Johannes Schönberger, Lorenzo Porzi, Yuchen Zhang, Tobias Fischer, Arno Knapitsch, Duncan Zauss, Ethan Weber, Nelson Antunes, Jonathon Luiten, Manuel Lopez-Antequera, Samuel Rota Bulò, Christian Richardt, Deva Ramanan, Sebastian Scherer, and Peter Kontschieder. Ma- pAnything: Universal feed-forward metric 3D reconstruction. I...

2026

[52] [52]

Black, and Otmar Hilliges

Ze Yang, Yun Chen, Jingkang Wang, Sivabalan Manivasagam, Wei-Chiu Ma, Anqi Joyce Yang, and Raquel Urtasun. UniSim: A Neural Closed-Loop Sensor Simulator . In2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1389–1399, Los Alamitos, CA, USA, June 2023. IEEE Computer Society. doi: 10.1109/CVPR52729.2023.00140. URL https://doi...

work page doi:10.1109/cvpr52729.2023.00140 2023

[53] [53]

Recondreamer: Crafting world models for driving scene reconstruction via online restoration

Chaojun Ni, Guosheng Zhao, Xiaofeng Wang, Zheng Zhu, Wenkang Qin, Guan Huang, Chen Liu, Yuyin Chen, Yida Wang, Xueyang Zhang, et al. Recondreamer: Crafting world models for driving scene reconstruction via online restoration. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 1559–1569, 2025

2025

[54] [54]

Social lstm: Human trajectory prediction in crowded spaces

Alexandre Alahi, Kratarth Goel, Vignesh Ramanathan, Alexandre Robicquet, Li Fei-Fei, and Silvio Savarese. Social lstm: Human trajectory prediction in crowded spaces. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016

2016

[55] [55]

Royden Wagner, Omer Sahin Tas, Jaime Villa, Felix Hauser, Yinzhe Shen, Marlon Steiner, Dominik Strutz, Carlos Fernandez, Christian Kinzig, Guillermo S. Guitierrez-Cabello, Hendrik Königshof, Fabian Immel, Richard Schwarzkopf, Nils Alexander Rack, Kevin Rösch, Kaiwen Wang, Jan-Hendrik Pauls, Martin Lauer, Igor Gilitschenski, Holger Caesar, and Christoph St...

Pith/arXiv arXiv 2026

[56] [56]

Divide and merge: Motion and semantic learning in end-to-end autonomous driving.Transactions on Machine Learning Research, 2025(11), 2025

Yinzhe Shen, Omer ¸ Sahin Tas, Kaiwen Wang, Royden Wagner, and Christoph Stiller. Divide and merge: Motion and semantic learning in end-to-end autonomous driving.Transactions on Machine Learning Research, 2025(11), 2025

2025

[57] [57]

Navigation-guided sparse scene representation for end-to-end autonomous driving

Peidong Li and Dixiao Cui. Navigation-guided sparse scene representation for end-to-end autonomous driving. InInternational Conference on Learning Representations (ICLR), 2025

2025

[58] [58]

Epona: Autoregressive diffusion world model for autonomous driving

Kaiwen Zhang, Zhenyu Tang, Xiaotao Hu, Xingang Pan, Xiaoyang Guo, Yuan Liu, Jingwei Huang, Li Yuan, Qian Zhang, Xiao-Xiao Long, Xun Cao, and Wei Yin. Epona: Autoregressive diffusion world model for autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025

2025

[59] [59]

RawTherapee: A powerful cross-platform raw photo processing program

RawTherapee Development Team. RawTherapee: A powerful cross-platform raw photo processing program. URL https://github.com/RawTherapee/RawTherapee. Includes the AMaZE demosaicing algorithm and raw-domain chromatic aberration correction by E. J. Martinec

[60] [60]

Users prefer jpegli over same-sized libjpeg-turbo or mozjpeg, 2024

Martin Bruse, Luca Versari, Zoltan Szabadka, and Jyrki Alakuijala. Users prefer jpegli over same-sized libjpeg-turbo or mozjpeg, 2024. URLhttps://arxiv.org/abs/2403.18589

arXiv 2024

[61] [61]

Face off: Privacy v progress — how deep natural anonymization pro- tects privacy in the age of machine learning

brighter AI Technologies. Face off: Privacy v progress — how deep natural anonymization pro- tects privacy in the age of machine learning. White paper, brighter AI Technologies GmbH, Berlin, Germany, 2022. URL https://ac-landing-pages-user-uploads-production.s3.amazonaws. com/0000122471/803bb7a7-de73-4596-9548-6d1ca3a80e32.pdf

arXiv 2022

[62] [62]

3dref: 3d dataset and benchmark for reflection detection in rgb and lidar data

Xiting Zhao and Sören Schwertfeger. 3dref: 3d dataset and benchmark for reflection detection in rgb and lidar data. In2024 International Conference on 3D Vision (3DV), pages 225–234, 2024. doi: 10.1109/3DV62453.2024.00009. 13

work page doi:10.1109/3dv62453.2024.00009 2024

[63] [63]

Kiss-slam: A simple, robust, and accurate 3d lidar slam system with enhanced generalization capabilities

Tiziano Guadagnino, Benedikt Mersch, Saurabh Gupta, Ignacio Vizzo, Giorgio Grisetti, and Cyrill Stachniss. Kiss-slam: A simple, robust, and accurate 3d lidar slam system with enhanced generalization capabilities. In2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5363–5370. IEEE, 2025

2025

[64] [64]

Calibrating multiple cameras with non-overlapping views using coded checkerboard targets

Tobias Strauß, Julius Ziegler, and Johannes Beck. Calibrating multiple cameras with non-overlapping views using coded checkerboard targets. In17th International IEEE Conference on Intelligent Transportation Systems (ITSC), pages 2623–2628, 2014. doi: 10.1109/ITSC.2014.6958110

work page doi:10.1109/itsc.2014.6958110 2014

[65] [65]

Generalized b-spline camera model

Johannes Beck and Christoph Stiller. Generalized b-spline camera model. In2018 IEEE Intelligent Vehicles Symposium (IV), pages 2137–2142, 2018. doi: 10.1109/IVS.2018.8500466

work page doi:10.1109/ivs.2018.8500466 2018

[66] [66]

SAM 3: Segment anything with concepts

Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Didac Suris Coll- Vinent, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, Jie Lei, Tengyu Ma, Baishan Guo, Arpit Kalla, Markus Marks, Joseph Greer, Meng Wang, Peize Sun, Roman Rädle, Triantafyllos Afouras, Effrosyni Mavroudi, Katherine Xu, Tsung-Han Wu, Yu Z...

2026

[67] [67]

Java OpenStreetMap Editor

JOSM. Java OpenStreetMap Editor. https://josm.openstreetmap.de/, 2026. Accessed: 01.05.2026

2026

[68] [68]

Mapillary

Mapillary. Mapillary. https://www.mapillary.com/app, 2026. Street-level imagery platform. Ac- cessed: 2026-05-04

2026

[69] [69]

https://doi.org/10.1126/scirobotics.abm6074 Chris Newcombe, Tim Rath, Fan Zhang, Bogdan Munteanu, Marc Brooker, and Michael Deardeuff

Steven Macenski, Tully Foote, Brian Gerkey, Chris Lalancette, and William Woodall. Robot operating system 2: Design, architecture, and uses in the wild.Science Robotics, 7(66):eabm6074, 2022. doi: 10.1126/scirobotics.abm6074. URL https://www.science.org/doi/abs/10.1126/scirobotics. abm6074

work page doi:10.1126/scirobotics.abm6074 2022

[70] [70]

Localization is all you evaluate: Data leakage in online mapping datasets and how to fix it

Adam Lilja, Junsheng Fu, Erik Stenborg, and Lars Hammarstrand. Localization is all you evaluate: Data leakage in online mapping datasets and how to fix it. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22150–22159, 2024

2024

[71] [71]

Scaling open-vocabulary object detection

Matthias Minderer, Alexey Gritsenko, and Neil Houlsby. Scaling open-vocabulary object detection. Advances in Neural Information Processing Systems, 36:72983–73007, 2023. 14 A Details on the Sensor Setup Tables 7 to 10 describe our sensor setup in detail, with a real-world picture of it shown in Figure 10. Table 7:Camera setup.All cameras are manufactured ...

arXiv 2023