pith. sign in

arxiv: 2606.02956 · v1 · pith:VCTEDVH3new · submitted 2026-06-01 · 💻 cs.CV · cs.LG· cs.RO

The Road Ahead in Autonomous Driving: The KITScenes Multimodal Dataset

Pith reviewed 2026-06-28 14:36 UTC · model grok-4.3

classification 💻 cs.CV cs.LGcs.RO
keywords autonomous driving datasetHD mapsmultimodal sensorstraffic element mappingEuropean urban drivingspatial learning benchmarks
0
0 comments X

The pith

A new multimodal dataset supplies the most complete HD maps of any public autonomous driving collection, with traffic elements placed in accurate 3D and fully connected.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents KITScenes Multimodal as a European dataset that pairs high-resolution cameras, long-range lidar, 4D radar, and precise localization with unusually detailed HD maps. It argues that earlier datasets fall short on map completeness and geographic variety, especially in cities with irregular layouts and mixed traffic. The authors state that their maps include every driving-relevant element such as traffic lights mapped in 3D at reprojection-accurate positions with full topological connectivity, and that this completeness has been checked through actual driving trials on open-source software. The work also defines four benchmarks that test spatial learning in embodied AI settings.

Core claim

Our HD maps are, to our knowledge, the most complete of any sensor dataset, validated through autonomous driving trials on open-source software. For the first time in a public dataset, all driving-relevant traffic elements, such as traffic lights, are mapped in 3D to a reprojection-accurate level with full topological connectivity.

What carries the argument

The HD maps that locate every driving-relevant traffic element in 3D at reprojection-accurate positions and maintain full topological connectivity among them.

If this is right

  • Algorithms for end-to-end driving can be evaluated against maps that contain every traffic element in connected 3D form.
  • Online HD map construction and long-range depth estimation benchmarks become available on data from irregular city layouts.
  • Novel view synthesis methods can be tested on synchronized multimodal recordings that include the new map layer.
  • Training sets gain coverage of mixed-traffic European streets that differ from grid-based collections.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the maps hold up under wider testing, planners could rely on them to simulate complex intersections that current datasets omit.
  • The sensor synchronization details might be reused to reduce timing errors in other multimodal collections.
  • Adding more cities with similar map density would allow direct measurement of how map completeness affects model generalization.

Load-bearing premise

The maps reach reprojection accuracy and complete connectivity because the sensor synchronization and mapping steps contain no systematic errors, an assumption supported only by the unshown details of the driving-trial validation.

What would settle it

A set of camera images in which the projected positions of mapped traffic lights deviate from their visible locations or a driving test that reveals missing road connections in the supplied topology.

Figures

Figures reproduced from arXiv: 2606.02956 by Alexander Blumberg, Annika B\"atz, Carlos Fernandez, Christoph Stiller, Dominik Strutz, Fabian Immel, Fabian Konstantinidis, Felix Hauser, Frank Bieder, Gleb Stepanov, Holger Caesar, Jaime Villa, Jan-Hendrik Pauls, Jonas Merkert, Julian Truetsch, Kaiwen Wang, Kevin R\"osch, Marlon Steiner, Nils Rack, \"Omer \c{S}ahin Ta\c{s}, Richard Schwarzkopf, Royden Wagner, Willi Poh, Yinzhe Shen.

Figure 1
Figure 1. Figure 1: A showcase of 3D HD map elements and the ground truth reprojected into 6 out of 9 [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: KITScenes Multimodal Sensor Setup. Our sensor rack (left) is depicted along with nominal [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Spatial coverage for two KITScenes cities. The color indicates the number of poses within [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Historical SOTA progression of online HD map construction models and example online [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 7
Figure 7. Figure 7: Qualitative comparison of monocular depth estimation methods. The corresponding non [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Example of lacking 3D geomet￾ric integrity in current NVS methods. The traffic sign in the shifted view on the right is inconsistent with its true 3D position shown by the reprojected bounding box [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
Figure 10
Figure 10. Figure 10: The KITScenes recording vehicle with the sensor setup as roofmount. [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Example frames of a scenario in Frankfurt. The reprojected traffic lights and signs can be [PITH_FULL_IMAGE:figures/full_fig_p017_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: The open-source autonomous driving stack Autoware [ [PITH_FULL_IMAGE:figures/full_fig_p018_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Binned label category statistics over Lanelet2 map elements. The left plot covers elements [PITH_FULL_IMAGE:figures/full_fig_p019_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Visualization of the split Definition for two KITScenes cities. The color indicates the split [PITH_FULL_IMAGE:figures/full_fig_p020_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: HD map outlines for our maps in the four cities of Frankfurt, Karlsruhe, Sindelfingen and [PITH_FULL_IMAGE:figures/full_fig_p021_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Schematic overview of the topology prediction with a GNN for the map elements road [PITH_FULL_IMAGE:figures/full_fig_p022_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Ground truth HD maps (left) and MapQR-Topo predictions (right) for two scenes. Grey [PITH_FULL_IMAGE:figures/full_fig_p022_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Qualitative comparison of monocular depth estimation methods. The corresponding [PITH_FULL_IMAGE:figures/full_fig_p023_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Qualitative comparison of traffic-sign recall under lateral viewpoint shifts. (a), (i) [PITH_FULL_IMAGE:figures/full_fig_p026_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Per-horizon profiles of map-grounded safety and lane-compliance metrics for all evaluated [PITH_FULL_IMAGE:figures/full_fig_p027_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: Additional qualitative end-to-end predictions on KITScenes Multimodal, complementing [PITH_FULL_IMAGE:figures/full_fig_p028_21.png] view at source ↗
read the original abstract

Existing autonomous driving datasets have enabled major progress, but fall short in sensor fidelity, map completeness, or geographic diversity. We present KITScenes Multimodal, a European dataset built around high-fidelity sensors and maps. Our fully synchronized sensor suite combines high-resolution global-shutter cameras, long-range lidar beyond 400m, 4D imaging radar, and redundant GNSS/INS localization. Our HD maps are, to our knowledge, the most complete of any sensor dataset, validated through autonomous driving trials on open-source software. For the first time in a public dataset, all driving-relevant traffic elements, such as traffic lights, are mapped in 3D to a reprojection-accurate level with full topological connectivity. Recorded in cities with irregular street layouts and mixed traffic modes, our dataset complements existing datasets by broadening the available geographic diversity. We also introduce four benchmarks, each advancing spatial learning for embodied AI: online HD map construction, long-range depth estimation, novel view synthesis, and end-to-end driving. Project page: https://kitscenes.com/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper introduces KITScenes Multimodal, a new European autonomous driving dataset featuring a synchronized high-fidelity sensor suite (high-resolution global-shutter cameras, long-range lidar >400m, 4D imaging radar, redundant GNSS/INS) and claims the most complete HD maps of any public sensor dataset. These maps include all driving-relevant traffic elements (e.g., traffic lights) mapped in 3D to a reprojection-accurate level with full topological connectivity, validated via autonomous driving trials on open-source software. The dataset targets geographic diversity in irregular street layouts and mixed traffic, and introduces four benchmarks: online HD map construction, long-range depth estimation, novel view synthesis, and end-to-end driving.

Significance. If the HD map completeness and accuracy claims hold with supporting evidence, the dataset would meaningfully complement existing ones by expanding geographic coverage and providing uniquely detailed 3D topological maps, potentially enabling stronger progress on the four proposed benchmarks for embodied AI and spatial learning in autonomous driving.

major comments (1)
  1. [Abstract] Abstract: The central claim that the HD maps are 'the most complete of any sensor dataset' and achieve 'reprojection-accurate level' 3D mapping of traffic elements with 'full topological connectivity' for the first time in a public dataset rests on validation through autonomous driving trials, yet no quantitative metrics (e.g., reprojection error thresholds, connectivity success rates, or error statistics) or pipeline details are provided to substantiate this. This directly undermines assessment of the 'most complete' and 'first time' assertions.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address the major comment regarding the substantiation of our HD map claims below, and commit to revisions that strengthen the paper without altering its core contributions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that the HD maps are 'the most complete of any sensor dataset' and achieve 'reprojection-accurate level' 3D mapping of traffic elements with 'full topological connectivity' for the first time in a public dataset rests on validation through autonomous driving trials, yet no quantitative metrics (e.g., reprojection error thresholds, connectivity success rates, or error statistics) or pipeline details are provided to substantiate this. This directly undermines assessment of the 'most complete' and 'first time' assertions.

    Authors: We agree that the abstract would be strengthened by explicit quantitative metrics and pipeline details to support the claims. The full manuscript describes the mapping process and real-world validation via autonomous driving trials on open-source software, but does not include specific error statistics. In the revised version, we will add a dedicated subsection (likely in Section 3 or 4) providing quantitative metrics such as mean reprojection error for mapped traffic elements (e.g., traffic lights), topological connectivity success rates, and error statistics across the dataset, along with an overview of the mapping pipeline. This will enable direct assessment of the completeness and accuracy claims relative to prior datasets. revision: yes

Circularity Check

0 steps flagged

No circularity: descriptive dataset paper with no derivations or predictions

full rationale

The paper is a dataset release announcement. Its central claims concern sensor suite completeness, HD map coverage, and the introduction of four benchmarks. No equations, fitted parameters, predictions, or derivation chains appear in the provided text. Claims of map accuracy and topological connectivity are presented as empirical outcomes of the collection process rather than results derived from prior fitted quantities or self-citations. Because no load-bearing mathematical step exists that could reduce to its own inputs, the circularity score is 0.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a dataset collection and benchmarking paper; it introduces no free parameters, mathematical axioms, or invented physical entities.

pith-pipeline@v0.9.1-grok · 5815 in / 1108 out tokens · 30074 ms · 2026-06-28T14:36:44.875166+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

71 extracted references · 10 canonical work pages

  1. [1]

    Vision meets robotics: The kitti dataset.The International Journal of Robotics Research, 32(11):1231–1237, 2013

    Andreas Geiger, Philip Lenz, Raquel Urtasun, and Christoph Stiller. Vision meets robotics: The kitti dataset.The International Journal of Robotics Research, 32(11):1231–1237, 2013. doi: 10.1177/ 0278364913491297

  2. [2]

    Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom

    Holger Caesar, Varun Bankiti, Alex H. Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. nuscenes: A multimodal dataset for autonomous driving. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020

  3. [3]

    Scalability in perception for autonomous driving: Waymo open dataset

    Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, Vijay Vasudevan, Wei Han, Jiquan Ngiam, Hang Zhao, Aleksei Timofeev, Scott Ettinger, Maxim Krivokon, Amy Gao, Aditya Joshi, Yu Zhang, Jonathon Shlens, Zhifeng Chen, and Dragomir Anguelov. Scalability in perception...

  4. [4]

    Qi, Yin Zhou, Zoey Yang, Aurélien Chouard, Pei Sun, Jiquan Ngiam, Vijay Vasudevan, Alexander McCauley, Jonathon Shlens, and Dragomir Anguelov

    Scott Ettinger, Shuyang Cheng, Benjamin Caine, Chenxi Liu, Hang Zhao, Sabeek Pradhan, Yuning Chai, Ben Sapp, Charles R. Qi, Yin Zhou, Zoey Yang, Aurélien Chouard, Pei Sun, Jiquan Ngiam, Vijay Vasudevan, Alexander McCauley, Jonathon Shlens, and Dragomir Anguelov. Large scale interactive motion forecasting for autonomous driving: The waymo open motion datas...

  5. [5]

    Man truckscenes: A multimodal dataset for autonomous trucking in diverse conditions

    Felix Fent, Fabian Kuttenreich, Florian Ruch, Farija Rizwin, Stefan Juergens, Lorenz Lecher- mann, Christian Nissler, Andrea Perl, Ulrich V oll, Min Yan, and Markus Lienkamp. Man truckscenes: A multimodal dataset for autonomous trucking in diverse conditions. In A. Glober- son, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,A...

  6. [6]

    Truck- drive: Long-range autonomous highway driving dataset, 2026

    Filippo Ghilotti, Edoardo Palladin, Samuel Brucker, Adam Sigal, Mario Bijelic, and Felix Heide. Truck- drive: Long-range autonomous highway driving dataset, 2026. URL https://arxiv.org/abs/2603. 02413

  7. [7]

    Zenseact open dataset: A large-scale and diverse multimodal dataset for autonomous driving

    Mina Alibeigi, William Ljungbergh, Adam Tonderski, Georg Hess, Adam Lilja, Carl Lindstrom, Daria Motorniuk, Junsheng Fu, Jenny Widahl, and Christoffer Petersson. Zenseact open dataset: A large-scale and diverse multimodal dataset for autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision, 2023

  8. [8]

    PhysicalAI-Autonomous-Vehicles

    NVIDIA Corporation. PhysicalAI-Autonomous-Vehicles. https://huggingface.co/datasets/ nvidia/PhysicalAI-Autonomous-Vehicles, oct 2025. Accessed 2026-05-06, released 2025-10-28

  9. [9]

    Lanelet2: A high-definition map framework for the future of automated driving

    Fabian Poggenhans, Jan-Hendrik Pauls, Johannes Janosovits, Stefan Orf, Maximilian Naumann, Florian Kuhnt, and Matthias Mayr. Lanelet2: A high-definition map framework for the future of automated driving. In2018 21st International Conference on Intelligent Transportation Systems (ITSC), pages 1672–1679,

  10. [10]

    doi: 10.1109/ITSC.2018.8569929

  11. [11]

    Autoware

    Autoware Foundation. Autoware. https://github.com/autowarefoundation/autoware. Accessed: 2026-05-02

  12. [12]

    Argoverse 2: Next generation datasets for self-driving perception and forecasting

    Benjamin Wilson, William Qi, Tanmay Agarwal, John Lambert, Jagjeet Singh, Siddhesh Khandelwal, Bowen Pan, Ratnesh Kumar, Andrew Hartnett, Jhony Kaesemodel Pontes, Deva Ramanan, Peter Carr, and James Hays. Argoverse 2: Next generation datasets for self-driving perception and forecasting. In Proceedings of the Neural Information Processing Systems Track on ...

  13. [13]

    Evangelidis and Emmanouil Z

    Xinyu Huang, Peng Wang, Xinjing Cheng, Dingfu Zhou, Qichuan Geng, and Ruigang Yang. The ApolloScape Open Dataset for Autonomous Driving and Its Application .IEEE Transactions on Pattern Analysis & Machine Intelligence, 42(10):2702–2719, October 2020. ISSN 1939-3539. doi: 10.1109/TPAMI. 2019.2926463. URLhttps://doi.ieeecomputersociety.org/10.1109/TPAMI.201...

  14. [14]

    One million scenes for autonomous driving: Once dataset

    Jiageng Mao, Niu Minzhe, ChenHan Jiang, hanxue liang, Jingheng Chen, Xiaodan Liang, Yamin Li, Chaoqiang Ye, Wei Zhang, Zhenguo Li, Jie Yu, Chunjing XU, and Hang Xu. One million scenes for autonomous driving: Once dataset. In J. Vanschoren and S. Yeung, editors,Proceed- ings of the Neural Information Processing Systems Track on Datasets and Benchmarks, volume 1,

  15. [15]

    URL https://datasets-benchmarks-proceedings.neurips.cc/paper_files/paper/ 2021/file/67c6a1e7ce56d3d6fa748ab6d9af3fd7-Paper-round1.pdf

  16. [16]

    KITTI-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d.Pattern Analysis and Machine Intelligence (PAMI), 2022

    Yiyi Liao, Jun Xie, and Andreas Geiger. KITTI-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d.Pattern Analysis and Machine Intelligence (PAMI), 2022

  17. [17]

    Openlane-v2: A topology reasoning benchmark for unified 3d hd mapping

    Huijie Wang, Tianyu Li, Yang Li, Li Chen, Chonghao Sima, Zhenbo Liu, Bangjun Wang, Peijin Jia, Yuting Wang, Shengyin Jiang, et al. Openlane-v2: A topology reasoning benchmark for unified 3d hd mapping. InThirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023

  18. [18]

    Hdmapnet: An online hd map construction and evaluation framework

    Qi Li, Yue Wang, Yilun Wang, and Hang Zhao. Hdmapnet: An online hd map construction and evaluation framework. In2022 International Conference on Robotics and Automation (ICRA), pages 4628–4634,

  19. [19]

    doi: 10.1109/ICRA46639.2022.9812383

  20. [20]

    VectorMapNet: End-to-end vectorized HD map learning

    Yicheng Liu, Tianyuan Yuan, Yue Wang, Yilun Wang, and Hang Zhao. VectorMapNet: End-to-end vectorized HD map learning. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors,Proceedings of the 40th International Conference on Machine Learning, volume 202 ofProceedings of Machine Learning Research,...

  21. [21]

    Maptr: Structured modeling and learning for online vectorized hd map construction.arXiv preprint arXiv:2208.14437, 2022

    Bencheng Liao, Shaoyu Chen, Xinggang Wang, Tianheng Cheng, Qian Zhang, Wenyu Liu, and Chang Huang. Maptr: Structured modeling and learning for online vectorized hd map construction.arXiv preprint arXiv:2208.14437, 2022

  22. [22]

    Maptrv2: An end-to-end framework for online vectorized hd map construction.Interna- tional Journal of Computer Vision, Oct 2024

    Bencheng Liao, Shaoyu Chen, Yunchi Zhang, Bo Jiang, Qian Zhang, Wenyu Liu, Chang Huang, and Xinggang Wang. Maptrv2: An end-to-end framework for online vectorized hd map construction.Interna- tional Journal of Computer Vision, Oct 2024. ISSN 1573-1405. doi: 10.1007/s11263-024-02235-z. URL https://doi.org/10.1007/s11263-024-02235-z

  23. [23]

    Streammapnet: Streaming mapping network for vectorized online hd map construction

    Tianyuan Yuan, Yicheng Liu, Yue Wang, Yilun Wang, and Hang Zhao. Streammapnet: Streaming mapping network for vectorized online hd map construction. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 7356–7365, 2024

  24. [24]

    End-to-end vectorized hd-map construction with piecewise bezier curve

    Limeng Qiao, Wenjie Ding, Xi Qiu, and Chi Zhang. End-to-end vectorized hd-map construction with piecewise bezier curve. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13218–13228, June 2023

  25. [25]

    Pivotnet: Vectorized pivot learning for end-to-end hd map construction

    Wenjie Ding, Limeng Qiao, Xi Qiu, and Chi Zhang. Pivotnet: Vectorized pivot learning for end-to-end hd map construction. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 3672–3682, 2023

  26. [26]

    Stream query denoising for vectorized hd-map construction

    Shuo Wang, Fan Jia, Weixin Mao, Yingfei Liu, Yucheng Zhao, Zehui Chen, Tiancai Wang, Chi Zhang, Xiangyu Zhang, and Feng Zhao. Stream query denoising for vectorized hd-map construction. InEuropean Conference on Computer Vision, pages 203–220. Springer, 2024

  27. [27]

    Maptracker: Tracking with strided memory fusion for consistent vector hd mapping

    Jiacheng Chen, Yuefan Wu, Jiaqi Tan, Hang Ma, and Yasutaka Furukawa. Maptracker: Tracking with strided memory fusion for consistent vector hd mapping. InEuropean Conference on Computer Vision, pages 90–107. Springer, 2024

  28. [28]

    Enhancing vectorized map perception with historical rasterized maps

    Xiaoyu Zhang, Guangwei Liu, Zihao Liu, Ningyi Xu, Yunhui Liu, and Ji Zhao. Enhancing vectorized map perception with historical rasterized maps. InEuropean Conference on Computer Vision, pages 422–439. Springer, 2024

  29. [29]

    Globalmapnet: An online framework for vectorized global hd map construction.arXiv preprint arXiv:2409.10063, 2024

    Anqi Shi, Yuze Cai, Xiangyu Chen, Jian Pu, Zeyu Fu, and Hong Lu. Globalmapnet: An online framework for vectorized global hd map construction.arXiv preprint arXiv:2409.10063, 2024

  30. [30]

    Mapexpert: Online hd map construction with simple and efficient sparse map element expert

    Dapeng Zhang, Dayu Chen, Peng Zhi, Yinda Chen, Zhenlong Yuan, Chenyang Li, Rui Zhou, Qingguo Zhou, et al. Mapexpert: Online hd map construction with simple and efficient sparse map element expert. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 14745–14753, 2025. 11

  31. [31]

    Histrackmap: Global vectorized high-definition map construction via history map tracking.arXiv preprint arXiv:2503.07168, 2025

    Jing Yang, Sen Yang, Xiao Tan, and Hanli Wang. Histrackmap: Global vectorized high-definition map construction via history map tracking.arXiv preprint arXiv:2503.07168, 2025

  32. [32]

    Mapping like a skeptic: Probabilistic bev projection for online hd mapping.arXiv preprint arXiv:2508.21689, 2025

    Fatih Erdo˘gan, Merve Rabia Barın, and Fatma Güney. Mapping like a skeptic: Probabilistic bev projection for online hd mapping.arXiv preprint arXiv:2508.21689, 2025

  33. [33]

    Generation of training data from hd maps in the lanelet2 framework.arXiv preprint arXiv:2407.17409, 2024

    Fabian Immel, Richard Fehler, Frank Bieder, and Christoph Stiller. Generation of training data from hd maps in the lanelet2 framework.arXiv preprint arXiv:2407.17409, 2024. URL https://arxiv.org/ abs/2407.17409

  34. [34]

    3d packing for self- supervised monocular depth estimation

    Vitor Guizilini, Rares Ambrus, Sudeep Pillai, Allan Raventos, and Adrien Gaidon. 3d packing for self- supervised monocular depth estimation. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020

  35. [35]

    Chen, Zhenyu Li, Guang Shi, Jiashi Feng, and Bingyi Kang

    Haotong Lin, Sili Chen, Jun Hao Liew, Donny Y . Chen, Zhenyu Li, Guang Shi, Jiashi Feng, and Bingyi Kang. Depth anything 3: Recovering the visual space from any views.arXiv preprint arXiv:2511.10647, 2025

  36. [36]

    Unidac: Universal metric depth estimation for any camera, 2026

    Girish Chandar Ganesan, Yuliang Guo, Liu Ren, and Xiaoming Liu. Unidac: Universal metric depth estimation for any camera, 2026. URLhttps://arxiv.org/abs/2603.27105

  37. [37]

    Mars: An instance-aware, modular and realistic simulator for autonomous driving

    Zirui Wu, Tianyu Liu, Liyi Luo, Zhide Zhong, Jianteng Chen, Hongmin Xiao, Chao Hou, Haozhe Lou, Yuantao Chen, Runyi Yang, Yuxin Huang, Xiaoyu Ye, Zike Yan, Yongliang Shi, Yiyi Liao, and Hao Zhao. Mars: An instance-aware, modular and realistic simulator for autonomous driving. In Lu Fang, Jian Pei, Guangtao Zhai, and Ruiping Wang, editors,Artificial Intell...

  38. [38]

    EmerneRF: Emergent spatial-temporal scene decomposition via self-supervision

    Jiawei Yang, Boris Ivanovic, Or Litany, Xinshuo Weng, Seung Wook Kim, Boyi Li, Tong Che, Danfei Xu, Sanja Fidler, Marco Pavone, and Yue Wang. EmerneRF: Emergent spatial-temporal scene decomposition via self-supervision. InThe Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=ycv2z8TYur

  39. [39]

    Street gaussians: Modeling dynamic urban scenes with gaussian splatting

    Yunzhi Yan, Haotong Lin, Chenxu Zhou, Weijie Wang, Haiyang Sun, Kun Zhan, Xianpeng Lang, Xiaowei Zhou, and Sida Peng. Street gaussians: Modeling dynamic urban scenes with gaussian splatting. In European Conference on Computer Vision, pages 156–173. Springer, 2024

  40. [40]

    Omnire: Omni urban scene reconstruction

    Ziyu Chen, Jiawei Yang, Jiahui Huang, Riccardo de Lutio, Janick Martinez Esturo, Boris Ivanovic, Or Litany, Zan Gojcic, Sanja Fidler, Marco Pavone, Li Song, and Yue Wang. Omnire: Omni urban scene reconstruction. InThe Thirteenth International Conference on Learning Representations, 2025

  41. [41]

    Recondrive: Fast feed-forward 4d gaussian splatting for autonomous driving scene reconstruction

    Haibao Yu, Kuntao Xiao, Jiahang Wang, Ruiyang Hao, Guoran Hu, Yuxin Huang, Haifang Qin, Bowen Jing, Yuntian Bo, and Ping Luo. Recondrive: Fast feed-forward 4d gaussian splatting for autonomous driving scene reconstruction. Inhttps://arxiv.org/abs/2603.07552, 2026

  42. [42]

    Planning-oriented autonomous driving

    Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, et al. Planning-oriented autonomous driving. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 17853–17862, 2023

  43. [43]

    Vad: Vectorized scene representation for efficient autonomous driving

    Bo Jiang, Shaoyu Chen, Qing Xu, Bencheng Liao, Jiajie Chen, Helong Zhou, Qian Zhang, Wenyu Liu, Chang Huang, and Xinggang Wang. Vad: Vectorized scene representation for efficient autonomous driving. ICCV, 2023

  44. [44]

    Introducing Jpegli: A new JPEG coding library

    Zoltan Szabadka, Martin Bruse, and Jyrki Alakuijala. Introducing Jpegli: A new JPEG coding library. Google Open Source Blog, April 2024. URL https://opensource.googleblog.com/2024/04/ introducing-jpegli-new-jpeg-coding-library.html. Accessed: 2026-05-01

  45. [45]

    Tan et al

    K. Tan et al. H. Caesar, J. Kabzan. Nuplan: A closed-loop ml-based planning benchmark for autonomous vehicles. InCVPR ADP3 workshop, 2021

  46. [46]

    Trust, but verify: Cross-modality fusion for hd map change detection

    John Lambert and James Hays. Trust, but verify: Cross-modality fusion for hd map change detection. In J. Vanschoren and S. Yeung, editors,Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, volume 1. Curran, 2021

  47. [47]

    Gtsign-220: A crowd-sourced, stvo-aligned benchmark for fine-grained german traffic sign recognition

    Miriam Louise Carnot, Erik Fastermann, Jonas Kunze, Eric Peukert, André Ludwig, and Bogdan Franczyk. Gtsign-220: A crowd-sourced, stvo-aligned benchmark for fine-grained german traffic sign recognition. In Intelligent Vehicles Symposium (IV), 2026. 12

  48. [48]

    Contact-GraspNet: Efficient 6-dof grasp generation in cluttered scenes

    Jan-Hendrik Pauls, Benjamin Schmidt, and Christoph Stiller. Automatic mapping of tailored landmark representations for automated driving and map learning. In2021 IEEE International Conference on Robotics and Automation (ICRA), pages 6725–6731, 2021. doi: 10.1109/ICRA48506.2021.9561432

  49. [49]

    Leveraging enhanced queries of point sets for vectorized map construction

    Zihao Liu, Xiaoyu Zhang, Guangwei Liu, Ji Zhao, and Ningyi Xu. Leveraging enhanced queries of point sets for vectorized map construction. InEuropean Conference on Computer Vision, 2024

  50. [50]

    SDTagnet: Leveraging text-annotated navigation maps for online HD map construction

    Fabian Immel, Jan-Hendrik Pauls, Richard Fehler, Frank Bieder, Jonas Merkert, and Christoph Stiller. SDTagnet: Leveraging text-annotated navigation maps for online HD map construction. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. URL https://openreview.net/ forum?id=N3E1cU8Cv3

  51. [51]

    Ma- pAnything: Universal feed-forward metric 3D reconstruction

    Nikhil Keetha, Norman Müller, Johannes Schönberger, Lorenzo Porzi, Yuchen Zhang, Tobias Fischer, Arno Knapitsch, Duncan Zauss, Ethan Weber, Nelson Antunes, Jonathon Luiten, Manuel Lopez-Antequera, Samuel Rota Bulò, Christian Richardt, Deva Ramanan, Sebastian Scherer, and Peter Kontschieder. Ma- pAnything: Universal feed-forward metric 3D reconstruction. I...

  52. [52]

    Black, and Otmar Hilliges

    Ze Yang, Yun Chen, Jingkang Wang, Sivabalan Manivasagam, Wei-Chiu Ma, Anqi Joyce Yang, and Raquel Urtasun. UniSim: A Neural Closed-Loop Sensor Simulator . In2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1389–1399, Los Alamitos, CA, USA, June 2023. IEEE Computer Society. doi: 10.1109/CVPR52729.2023.00140. URL https://doi...

  53. [53]

    Recondreamer: Crafting world models for driving scene reconstruction via online restoration

    Chaojun Ni, Guosheng Zhao, Xiaofeng Wang, Zheng Zhu, Wenkang Qin, Guan Huang, Chen Liu, Yuyin Chen, Yida Wang, Xueyang Zhang, et al. Recondreamer: Crafting world models for driving scene reconstruction via online restoration. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 1559–1569, 2025

  54. [54]

    Social lstm: Human trajectory prediction in crowded spaces

    Alexandre Alahi, Kratarth Goel, Vignesh Ramanathan, Alexandre Robicquet, Li Fei-Fei, and Silvio Savarese. Social lstm: Human trajectory prediction in crowded spaces. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016

  55. [55]

    Royden Wagner, Omer Sahin Tas, Jaime Villa, Felix Hauser, Yinzhe Shen, Marlon Steiner, Dominik Strutz, Carlos Fernandez, Christian Kinzig, Guillermo S. Guitierrez-Cabello, Hendrik Königshof, Fabian Immel, Richard Schwarzkopf, Nils Alexander Rack, Kevin Rösch, Kaiwen Wang, Jan-Hendrik Pauls, Martin Lauer, Igor Gilitschenski, Holger Caesar, and Christoph St...

  56. [56]

    Divide and merge: Motion and semantic learning in end-to-end autonomous driving.Transactions on Machine Learning Research, 2025(11), 2025

    Yinzhe Shen, Omer ¸ Sahin Tas, Kaiwen Wang, Royden Wagner, and Christoph Stiller. Divide and merge: Motion and semantic learning in end-to-end autonomous driving.Transactions on Machine Learning Research, 2025(11), 2025

  57. [57]

    Navigation-guided sparse scene representation for end-to-end autonomous driving

    Peidong Li and Dixiao Cui. Navigation-guided sparse scene representation for end-to-end autonomous driving. InInternational Conference on Learning Representations (ICLR), 2025

  58. [58]

    Epona: Autoregressive diffusion world model for autonomous driving

    Kaiwen Zhang, Zhenyu Tang, Xiaotao Hu, Xingang Pan, Xiaoyang Guo, Yuan Liu, Jingwei Huang, Li Yuan, Qian Zhang, Xiao-Xiao Long, Xun Cao, and Wei Yin. Epona: Autoregressive diffusion world model for autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025

  59. [59]

    RawTherapee: A powerful cross-platform raw photo processing program

    RawTherapee Development Team. RawTherapee: A powerful cross-platform raw photo processing program. URL https://github.com/RawTherapee/RawTherapee. Includes the AMaZE demosaicing algorithm and raw-domain chromatic aberration correction by E. J. Martinec

  60. [60]

    Users prefer jpegli over same-sized libjpeg-turbo or mozjpeg, 2024

    Martin Bruse, Luca Versari, Zoltan Szabadka, and Jyrki Alakuijala. Users prefer jpegli over same-sized libjpeg-turbo or mozjpeg, 2024. URLhttps://arxiv.org/abs/2403.18589

  61. [61]

    Face off: Privacy v progress — how deep natural anonymization pro- tects privacy in the age of machine learning

    brighter AI Technologies. Face off: Privacy v progress — how deep natural anonymization pro- tects privacy in the age of machine learning. White paper, brighter AI Technologies GmbH, Berlin, Germany, 2022. URL https://ac-landing-pages-user-uploads-production.s3.amazonaws. com/0000122471/803bb7a7-de73-4596-9548-6d1ca3a80e32.pdf

  62. [62]

    3dref: 3d dataset and benchmark for reflection detection in rgb and lidar data

    Xiting Zhao and Sören Schwertfeger. 3dref: 3d dataset and benchmark for reflection detection in rgb and lidar data. In2024 International Conference on 3D Vision (3DV), pages 225–234, 2024. doi: 10.1109/3DV62453.2024.00009. 13

  63. [63]

    Kiss-slam: A simple, robust, and accurate 3d lidar slam system with enhanced generalization capabilities

    Tiziano Guadagnino, Benedikt Mersch, Saurabh Gupta, Ignacio Vizzo, Giorgio Grisetti, and Cyrill Stachniss. Kiss-slam: A simple, robust, and accurate 3d lidar slam system with enhanced generalization capabilities. In2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5363–5370. IEEE, 2025

  64. [64]

    Calibrating multiple cameras with non-overlapping views using coded checkerboard targets

    Tobias Strauß, Julius Ziegler, and Johannes Beck. Calibrating multiple cameras with non-overlapping views using coded checkerboard targets. In17th International IEEE Conference on Intelligent Transportation Systems (ITSC), pages 2623–2628, 2014. doi: 10.1109/ITSC.2014.6958110

  65. [65]

    Generalized b-spline camera model

    Johannes Beck and Christoph Stiller. Generalized b-spline camera model. In2018 IEEE Intelligent Vehicles Symposium (IV), pages 2137–2142, 2018. doi: 10.1109/IVS.2018.8500466

  66. [66]

    SAM 3: Segment anything with concepts

    Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Didac Suris Coll- Vinent, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, Jie Lei, Tengyu Ma, Baishan Guo, Arpit Kalla, Markus Marks, Joseph Greer, Meng Wang, Peize Sun, Roman Rädle, Triantafyllos Afouras, Effrosyni Mavroudi, Katherine Xu, Tsung-Han Wu, Yu Z...

  67. [67]

    Java OpenStreetMap Editor

    JOSM. Java OpenStreetMap Editor. https://josm.openstreetmap.de/, 2026. Accessed: 01.05.2026

  68. [68]

    Mapillary

    Mapillary. Mapillary. https://www.mapillary.com/app, 2026. Street-level imagery platform. Ac- cessed: 2026-05-04

  69. [69]

    https://doi.org/10.1126/scirobotics.abm6074 Chris Newcombe, Tim Rath, Fan Zhang, Bogdan Munteanu, Marc Brooker, and Michael Deardeuff

    Steven Macenski, Tully Foote, Brian Gerkey, Chris Lalancette, and William Woodall. Robot operating system 2: Design, architecture, and uses in the wild.Science Robotics, 7(66):eabm6074, 2022. doi: 10.1126/scirobotics.abm6074. URL https://www.science.org/doi/abs/10.1126/scirobotics. abm6074

  70. [70]

    Localization is all you evaluate: Data leakage in online mapping datasets and how to fix it

    Adam Lilja, Junsheng Fu, Erik Stenborg, and Lars Hammarstrand. Localization is all you evaluate: Data leakage in online mapping datasets and how to fix it. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22150–22159, 2024

  71. [71]

    Scaling open-vocabulary object detection

    Matthias Minderer, Alexey Gritsenko, and Neil Houlsby. Scaling open-vocabulary object detection. Advances in Neural Information Processing Systems, 36:72983–73007, 2023. 14 A Details on the Sensor Setup Tables 7 to 10 describe our sensor setup in detail, with a real-world picture of it shown in Figure 10. Table 7:Camera setup.All cameras are manufactured ...