pith. sign in

arxiv: 2511.19527 · v2 · submitted 2025-11-24 · 💻 cs.CV

MapRF: Weakly Supervised Online HD Map Construction via NeRF-Guided Self-Training

Pith reviewed 2026-05-17 06:44 UTC · model grok-4.3

classification 💻 cs.CV
keywords weakly supervised learningHD map constructionNeural Radiance Fieldsself-trainingonline mappingautonomous driving3D pseudo labels
0
0 comments X

The pith

MapRF constructs online HD maps from 2D image labels alone by using NeRF to generate consistent 3D pseudo labels and self-training to refine them.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that a map-construction network can be trained to produce accurate 3D road geometry and semantics without any 3D ground-truth labels. It does so by conditioning a Neural Radiance Field on the current map predictions, rendering view-consistent 3D pseudo labels from those fields, and then feeding the pseudo labels back into the map network in successive rounds of self-training. A Map-to-Ray Matching step keeps the loop from drifting by forcing predicted map elements to agree with the camera rays implied by the original 2D annotations. If the loop works, it removes the dominant cost barrier to large-scale HD map collection for autonomous driving.

Core claim

MapRF learns to output 3D HD maps by alternating between a NeRF module that turns map predictions into high-quality, multi-view-consistent 3D geometry and semantics and a map network that is retrained on those rendered labels; a Map-to-Ray Matching loss prevents error accumulation, so that after several iterations the method reaches roughly 75 percent of fully supervised performance on Argoverse 2 and nuScenes while outperforming other 2D-only baselines.

What carries the argument

A NeRF module conditioned on the map network's current predictions that renders view-consistent 3D pseudo labels, paired with a Map-to-Ray Matching alignment that forces map elements to lie on the rays defined by 2D image labels.

If this is right

  • HD map production for autonomous vehicles becomes feasible at city scale without 3D annotation crews.
  • The same NeRF-guided loop can be applied to any perception task where 2D labels are cheap and 3D labels are expensive.
  • Online mapping systems can keep improving after deployment by ingesting new 2D labels from fleet cameras.
  • Map accuracy approaches that of supervised methods while the training data cost drops by roughly an order of magnitude.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could be extended to fuse lidar or radar rays into the same matching step, tightening the supervision signal further.
  • If the NeRF renderer is replaced by a faster implicit surface model, the self-training cycle could run at real-time rates on the vehicle.
  • The iterative refinement pattern suggests that a deployed system could continue to adapt its map predictions as the vehicle encounters new road layouts.

Load-bearing premise

The NeRF module conditioned on map predictions can generate 3D pseudo labels that are accurate enough and free of systematic bias for the ray-matching correction to keep self-training from drifting.

What would settle it

Run the full self-training loop on Argoverse 2 and measure the final mAP or vectorized map metric; if it stays below 65 percent of the fully supervised baseline after the scheduled iterations, the claim that the NeRF pseudo labels are sufficiently reliable is false.

Figures

Figures reproduced from arXiv: 2511.19527 by Hongyu Lyu, Julie Stephany Berrio Perez, Mao Shan, Stewart Worrall, Thomas Monninger, Zhenxing Ming.

Figure 1
Figure 1. Figure 1: Motivation for MapRF. Compared to existing methods, MapRF learns from accessible 2D image labels to construct 3D HD maps online. We generate pseudo labels through the proposed NeRF module and use them for self-training. This design reduces data collection and annotation costs, thereby improving scalability. elements of interest in images and then annotate them in 3D LiDAR point clouds with cross-modal vali… view at source ↗
Figure 2
Figure 2. Figure 2: Overall framework of MapRF. The framework learns to construct 3D HD maps online using only 2D image annotations. We first train an initial map model with weak labels generated via IPM. We then optimize a Map-Conditioned NeRF (MC-NeRF) with multi-view labels to generate pseudo labels. We iteratively retrain the map model with pseudo labels and re-optimize MC-NeRF, forming a self-training loop. We further em… view at source ↗
Figure 3
Figure 3. Figure 3: Weak vs. pseudo labels. When roads exhibit slopes or eleva￾tion changes, projecting 2D labels onto a single plane introduces posi￾tional errors and geometric distortions. Although heuristic constraints can alleviate these issues, our pseudo labels yield representations that are more geometrically accurate. E = X M j=1 τj [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Analysis of Self-Training Results. Both pseudo-label quality and model performance improve progressively across self-training rounds, revealing a positive feedback loop. TABLE V: Ablation study of Map-to-Ray Matching. Comparison of different methods. The proposed MRM achieves the best perfor￾mance with consistent improvements across all categories. Method APdiv APped APbou mAP Conf. Thresh. + NMS [51] 57.5… view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative Results. MapRF produces geometrically accurate maps in diverse scenes. Multi-view images are overlaid with projected predictions from MapRF (round 3). Orange, blue, and green denote lane dividers, pedestrian crossings, and road boundaries, respectively. become more geometrically accurate, yielding a 13 % mAP gain after three rounds, while model performance increases by 8.6 % mAP. We observe con… view at source ↗
read the original abstract

Autonomous driving systems benefit from high-definition (HD) maps that provide critical information about road infrastructure. The online construction of HD maps offers a scalable approach to generate local maps from on-board sensors. However, existing methods typically rely on costly 3D map annotations for training, which limits their generalization and scalability across diverse driving environments. In this work, we propose MapRF, a weakly supervised framework that learns to construct 3D maps using only 2D image labels. To generate high-quality pseudo labels, we introduce a novel Neural Radiance Fields (NeRF) module conditioned on map predictions, which reconstructs view-consistent 3D geometry and semantics. These pseudo labels are then iteratively used to refine the map network in a self-training manner, enabling progressive improvement without additional supervision. Furthermore, to mitigate error accumulation during self-training, we propose a Map-to-Ray Matching strategy that aligns map predictions with camera rays derived from 2D labels. Extensive experiments on the Argoverse 2 and nuScenes datasets demonstrate that MapRF achieves performance comparable to fully supervised methods, attaining around 75% of the baseline while surpassing several approaches using only 2D labels. This highlights the potential of MapRF to enable scalable and cost-effective online HD map construction for autonomous driving.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes MapRF, a weakly supervised framework for online HD map construction from images that employs a NeRF module conditioned on the map network's predictions to generate view-consistent 3D geometry and semantic pseudo-labels. These labels supervise the map network in an iterative self-training loop, with a Map-to-Ray Matching strategy introduced to align predictions with 2D camera rays derived from image annotations and thereby reduce error accumulation. Experiments on Argoverse 2 and nuScenes report that the method reaches approximately 75% of fully supervised baseline performance while outperforming other approaches that use only 2D labels.

Significance. If the central claim is substantiated, the work would meaningfully advance scalable HD map learning by demonstrating that NeRF-guided self-training can lift 2D supervision to competitive 3D map quality without 3D ground truth. This could lower annotation costs for autonomous driving perception stacks and encourage further exploration of radiance-field priors in weakly supervised geometric tasks.

major comments (3)
  1. [§3.2] §3.2 (NeRF conditioning and pseudo-label generation): The description of how the map prediction is injected into the NeRF and how the resulting 3D geometry/semantic fields are converted into training signals for the map network is insufficiently precise. Without an explicit formulation of the conditioning mechanism and the loss terms, it is difficult to verify whether the self-training loop can escape early error basins as claimed.
  2. [§4] §4 (Map-to-Ray Matching): The paper asserts that projecting NeRF-derived labels back onto 2D rays derived from image annotations prevents systematic error accumulation, yet no quantitative analysis (e.g., ablation removing the matching term or monitoring 3D consistency metrics across iterations) is provided to support this. This is load-bearing for the headline result.
  3. [§5.1] §5.1 (Quantitative results): The claim of attaining ~75% of the fully supervised baseline is presented without error bars, statistical significance tests, or per-scene breakdowns. In addition, the exact definition of the baseline and the precise 2D-label-only competitors are not tabulated, making it impossible to assess whether the reported gains are robust or dataset-specific.
minor comments (2)
  1. [Abstract] The abstract and introduction repeatedly use the phrase 'around 75%' without specifying the primary metric (e.g., mAP, IoU) or the exact supervised baseline value; a table or explicit numerical comparison would improve clarity.
  2. [§3] Notation for the map network output, NeRF parameters, and ray-matching loss is introduced without a consolidated symbol table, which hinders readability in the method section.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and have revised the manuscript to improve precision, add supporting analyses, and enhance the presentation of results.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (NeRF conditioning and pseudo-label generation): The description of how the map prediction is injected into the NeRF and how the resulting 3D geometry/semantic fields are converted into training signals for the map network is insufficiently precise. Without an explicit formulation of the conditioning mechanism and the loss terms, it is difficult to verify whether the self-training loop can escape early error basins as claimed.

    Authors: We agree that greater mathematical precision is warranted. In the revised manuscript we have added explicit equations describing the conditioning of the NeRF on map predictions (including the precise injection of predicted geometry and semantics as additional inputs to the density and radiance MLPs). We also specify the volume-rendering procedure used to obtain 3D pseudo-labels and the exact loss terms (ray-wise cross-entropy and depth consistency) that supervise the map network. These additions clarify how iterative refinement progressively reduces early error accumulation. revision: yes

  2. Referee: [§4] §4 (Map-to-Ray Matching): The paper asserts that projecting NeRF-derived labels back onto 2D rays derived from image annotations prevents systematic error accumulation, yet no quantitative analysis (e.g., ablation removing the matching term or monitoring 3D consistency metrics across iterations) is provided to support this. This is load-bearing for the headline result.

    Authors: We acknowledge that the original manuscript relies primarily on end-to-end gains rather than isolated quantitative evidence for Map-to-Ray Matching. In the revision we have added an ablation that removes the matching term and reports the resulting drop in mIoU and CD. We further include plots of 3D consistency metrics (ray-alignment error and semantic consistency across views) tracked over self-training iterations, directly demonstrating the reduction in systematic drift. revision: yes

  3. Referee: [§5.1] §5.1 (Quantitative results): The claim of attaining ~75% of the fully supervised baseline is presented without error bars, statistical significance tests, or per-scene breakdowns. In addition, the exact definition of the baseline and the precise 2D-label-only competitors are not tabulated, making it impossible to assess whether the reported gains are robust or dataset-specific.

    Authors: We accept that the statistical presentation can be strengthened. The revised Section 5.1 now reports error bars computed over three independent runs, includes p-values from paired t-tests on the main comparisons, and provides per-scene metric breakdowns in the supplementary material. We have also expanded the comparison table to explicitly define the fully supervised baseline (identical architecture trained with 3D ground truth) and list each 2D-label-only competitor with its exact training protocol and reference implementation. revision: yes

Circularity Check

0 steps flagged

No significant circularity in self-training loop

full rationale

The paper describes an iterative self-training procedure in which map predictions condition a NeRF module to generate 3D pseudo-labels that are then used to supervise the map network, with a Map-to-Ray Matching step that projects predictions onto rays derived from external 2D image labels. This is a standard weakly-supervised training mechanism whose effectiveness is not equivalent to its inputs by construction. Performance claims (approximately 75% of fully-supervised baseline on Argoverse 2 and nuScenes) are supported by direct empirical comparisons against fully-supervised and other 2D-only baselines on held-out test data, rendering the central result self-contained against external benchmarks rather than tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the unproven assumption that a NeRF conditioned on imperfect map predictions will produce sufficiently accurate and consistent 3D geometry and semantics to drive useful self-training improvement; no independent verification of this assumption is provided in the abstract.

axioms (2)
  • domain assumption NeRF conditioned on map predictions reconstructs view-consistent 3D geometry and semantics usable as pseudo labels
    Invoked in the abstract as the mechanism to generate high-quality pseudo labels from 2D inputs.
  • ad hoc to paper Map-to-Ray Matching sufficiently prevents error accumulation in the self-training loop
    Presented as the key mitigation strategy without further justification in the abstract.
invented entities (1)
  • Map-to-Ray Matching strategy no independent evidence
    purpose: Align map predictions with camera rays derived from 2D labels to reduce error accumulation during self-training
    New component introduced to address a specific failure mode of the self-training process.

pith-pipeline@v0.9.0 · 5550 in / 1497 out tokens · 35862 ms · 2026-05-17T06:44:34.584020+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · 1 internal anchor

  1. [1]

    Online high-definition map construction for autonomous vehicles: A comprehensive survey,

    H. Lyu, J. S. Berrio Perez, Y . Huang, K. Li, M. Shan, and S. Wor- rall, “Online high-definition map construction for autonomous vehicles: A comprehensive survey,”Journal of Sensor and Actuator Networks, vol. 14, no. 1, p. 15, 2025

  2. [2]

    High-definition maps: Comprehensive survey, challenges, and future perspectives,

    G. Elghazaly, R. Frank, S. Harvey, and S. Safko, “High-definition maps: Comprehensive survey, challenges, and future perspectives,”IEEE Open Journal of Intelligent Transportation Systems, vol. 4, pp. 527–550, 2023

  3. [3]

    Semantic map learning of traffic light to lane assignment based on motion data,

    T. Monninger, A. Weber, and S. Staab, “Semantic map learning of traffic light to lane assignment based on motion data,” in2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2023, pp. 1583–1590

  4. [4]

    Loam: Lidar odometry and mapping in real- time

    J. Zhang and S. Singh, “Loam: Lidar odometry and mapping in real- time.” inRobotics: Science and systems, vol. 2, no. 9. Berkeley, CA, 2014, pp. 1–9

  5. [5]

    Lego-loam: Lightweight and ground-optimized lidar odometry and mapping on variable terrain,

    T. Shan and B. Englot, “Lego-loam: Lightweight and ground-optimized lidar odometry and mapping on variable terrain,” in2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2018, pp. 4758–4765

  6. [6]

    Long-term map maintenance pipeline for autonomous vehicles,

    J. S. Berrio, S. Worrall, M. Shan, and E. Nebot, “Long-term map maintenance pipeline for autonomous vehicles,”IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 8, pp. 10 427–10 440, 2021

  7. [7]

    Vectormapnet: End-to-end vectorized hd map learning,

    Y . Liu, T. Yuan, Y . Wang, Y . Wang, and H. Zhao, “Vectormapnet: End-to-end vectorized hd map learning,” inInternational Conference on Machine Learning. PMLR, 2023, pp. 22 352–22 369

  8. [8]

    Maptrv2: An end-to-end framework for online vectorized hd map construction,

    B. Liao, S. Chen, Y . Zhang, B. Jiang, Q. Zhang, W. Liu, C. Huang, and X. Wang, “Maptrv2: An end-to-end framework for online vectorized hd map construction,”International Journal of Computer Vision, vol. 133, no. 3, pp. 1352–1374, 2025

  9. [9]

    Augmap- net: Improving spatial latent structure via bev grid augmentation for enhanced vectorized online hd map construction,

    T. Monninger, M. Z. Anwar, S. Antol, S. Staab, and S. Ding, “Augmap- net: Improving spatial latent structure via bev grid augmentation for enhanced vectorized online hd map construction,”arXiv preprint arXiv:2503.13430, 2025

  10. [10]

    Mapdiffusion: Generative diffusion for vectorized online hd map con- struction and uncertainty estimation in autonomous driving,

    T. Monninger, Z. Zhang, Z. Mo, M. Z. Anwar, S. Staab, and S. Ding, “Mapdiffusion: Generative diffusion for vectorized online hd map con- struction and uncertainty estimation in autonomous driving,” in2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2025

  11. [11]

    Pseudomaptrainer: Learning online mapping without hd maps,

    C. L ¨owens, T. Funke, J. Xie, and A. P. Condurache, “Pseudomaptrainer: Learning online mapping without hd maps,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 5263–5272

  12. [12]

    Semvecnet: Generalizable vector map generation for arbi- trary sensor configurations,

    N. E. Ranganatha, H. Zhang, S. Venkatramani, J.-Y . Liao, and H. I. Christensen, “Semvecnet: Generalizable vector map generation for arbi- trary sensor configurations,” in2024 IEEE Intelligent Vehicles Sympo- sium (IV). IEEE, 2024, pp. 2820–2827

  13. [13]

    Ws-3d-lane: Weakly supervised 3d lane detection with 2d lane labels,

    J. Ai, W. Ding, J. Zhao, and J. Zhong, “Ws-3d-lane: Weakly supervised 3d lane detection with 2d lane labels,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023

  14. [14]

    Nerf: Representing scenes as neural radiance fields for view synthesis,

    B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,”Communications of the ACM, vol. 65, no. 1, pp. 99–106, 2021

  15. [15]

    Learning to detect mobile objects from lidar scans without labels,

    Y . You, K. Luo, C. P. Phoo, W.-L. Chao, W. Sun, B. Hariharan, M. Campbell, and K. Q. Weinberger, “Learning to detect mobile objects from lidar scans without labels,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 1130–1140

  16. [16]

    Hdmapnet: An online hd map construction and evaluation framework,

    Q. Li, Y . Wang, Y . Wang, and H. Zhao, “Hdmapnet: An online hd map construction and evaluation framework,” in2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 4628–4634

  17. [17]

    Maptr: Structured modeling and learning for online vectorized hd map construction,

    B. Liao, S. Chen, X. Wang, T. Cheng, Q. Zhang, W. Liu, and C. Huang, “Maptr: Structured modeling and learning for online vectorized hd map construction,” inThe Eleventh International Conference on Learning Representations, 2023

  18. [18]

    Instagram: Instance-level graph modeling for vectorized hd map learning,

    J. Shin, H. Jeong, F. Rameau, and D. Kum, “Instagram: Instance-level graph modeling for vectorized hd map learning,”IEEE Transactions on Intelligent Transportation Systems, 2025

  19. [19]

    End-to-end vectorized hd- map construction with piecewise bezier curve,

    L. Qiao, W. Ding, X. Qiu, and C. Zhang, “End-to-end vectorized hd- map construction with piecewise bezier curve,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 13 218–13 228

  20. [20]

    Pivotnet: Vectorized pivot learning for end-to-end hd map construction,

    W. Ding, L. Qiao, X. Qiu, and C. Zhang, “Pivotnet: Vectorized pivot learning for end-to-end hd map construction,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 3672–3682

  21. [21]

    Compact hd map construction via douglas-peucker point transformer,

    R. Liu and Z. Yuan, “Compact hd map construction via douglas-peucker point transformer,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 4, 2024, pp. 3702–3710

  22. [22]

    Online vectorized hd map construction using geometry,

    Z. Zhang, Y . Zhang, X. Ding, F. Jin, and X. Yue, “Online vectorized hd map construction using geometry,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 73–90

  23. [23]

    Online map vectorization for autonomous driving: A rasterization perspective,

    G. Zhang, J. Lin, S. Wu, Z. Luo, Y . Xue, S. Lu, Z. Wanget al., “Online map vectorization for autonomous driving: A rasterization perspective,” Advances in Neural Information Processing Systems, vol. 36, pp. 31 865– 31 877, 2023

  24. [24]

    Streammapnet: Streaming mapping network for vectorized online hd map construction,

    T. Yuan, Y . Liu, Y . Wang, Y . Wang, and H. Zhao, “Streammapnet: Streaming mapping network for vectorized online hd map construction,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 7356–7365

  25. [25]

    Stream query denoising for vectorized hd-map construction,

    S. Wang, F. Jia, W. Mao, Y . Liu, Y . Zhao, Z. Chen, T. Wang, C. Zhang, X. Zhang, and F. Zhao, “Stream query denoising for vectorized hd-map construction,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 203–220

  26. [26]

    Leveraging enhanced queries of point sets for vectorized map construction,

    Z. Liu, X. Zhang, G. Liu, J. Zhao, and N. Xu, “Leveraging enhanced queries of point sets for vectorized map construction,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 461–477

  27. [27]

    S2g2: Semi-supervised semantic bird- eye-view grid-map generation using a monocular camera for autonomous driving,

    S. Gao, Q. Wang, and Y . Sun, “S2g2: Semi-supervised semantic bird- eye-view grid-map generation using a monocular camera for autonomous driving,”IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 11 974–11 981, 2022

  28. [28]

    Semi-supervised learning for visual bird’s eye view semantic segmentation,

    J. Zhu, L. Liu, Y . Tang, F. Wen, W. Li, and Y . Liu, “Semi-supervised learning for visual bird’s eye view semantic segmentation,” in2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2024, pp. 9079–9085

  29. [29]

    Pct: Perspective cue train- ing framework for multi-camera bev segmentation,

    H. Ishikawa, T. Iida, Y . Konishi, and Y . Aoki, “Pct: Perspective cue train- ing framework for multi-camera bev segmentation,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2024, pp. 13 253–13 260

  30. [30]

    Exploring semi- supervised learning for online mapping,

    A. Lilja, E. Wallin, J. Fu, and L. Hammarstrand, “Exploring semi- supervised learning for online mapping,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 2477–2487

  31. [31]

    Letsmap: Unsupervised representation learning for label-efficient semantic bev mapping,

    N. Gosala, K. Petek, B. Ravi Kiran, S. Yogamani, P. Drews-Jr, W. Bur- gard, and A. Valada, “Letsmap: Unsupervised representation learning for label-efficient semantic bev mapping,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 110–126

  32. [32]

    Occfeat: Self-supervised occupancy feature prediction for pretraining bev segmentation networks,

    S. Sirko-Galouchenko, A. Boulch, S. Gidaris, A. Bursuc, A. V obecky, P. P ´erez, and R. Marlet, “Occfeat: Self-supervised occupancy feature prediction for pretraining bev segmentation networks,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 4493–4503

  33. [33]

    Sky- eye: Self-supervised bird’s-eye-view semantic mapping using monocular frontal view images,

    N. Gosala, K. Petek, P. L. Drews-Jr, W. Burgard, and A. Valada, “Sky- eye: Self-supervised bird’s-eye-view semantic mapping using monocular frontal view images,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 14 901–14 910

  34. [34]

    Rend- bev: Semantic novel view synthesis for self-supervised bird’s eye view segmentation,

    H. P. Monteagudo, L. Taccari, A. Pjetri, F. Sambo, and S. Salti, “Rend- bev: Semantic novel view synthesis for self-supervised bird’s eye view segmentation,” in2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE, 2025, pp. 535–544

  35. [35]

    Structure-from-motion revisited,

    J. L. Schonberger and J.-M. Frahm, “Structure-from-motion revisited,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 4104–4113

  36. [36]

    Pixelwise view selection for unstructured multi-view stereo,

    J. L. Sch ¨onberger, E. Zheng, J.-M. Frahm, and M. Pollefeys, “Pixelwise view selection for unstructured multi-view stereo,” inEuropean confer- ence on computer vision. Springer, 2016, pp. 501–518

  37. [37]

    pixelnerf: Neural radiance fields from one or few images,

    A. Yu, V . Ye, M. Tancik, and A. Kanazawa, “pixelnerf: Neural radiance fields from one or few images,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 4578– 4587

  38. [38]

    Point-nerf: Point-based neural radiance fields,

    Q. Xu, Z. Xu, J. Philip, S. Bi, Z. Shu, K. Sunkavalli, and U. Neumann, “Point-nerf: Point-based neural radiance fields,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 5438–5448

  39. [39]

    Pointnerf++: a multi-scale, point-based neural radiance field,

    W. Sun, E. Trulls, Y .-C. Tseng, S. Sambandam, G. Sharma, A. Tagliasac- chi, and K. M. Yi, “Pointnerf++: a multi-scale, point-based neural radiance field,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 221–238

  40. [40]

    Rome: Towards large scale road surface reconstruction via mesh representation,

    R. Mei, W. Sui, J. Zhang, X. Qin, G. Wang, T. Peng, T. Chen, and C. Yang, “Rome: Towards large scale road surface reconstruction via mesh representation,”IEEE Transactions on Intelligent Vehicles, vol. 9, no. 7, pp. 5173–5185, 2024

  41. [41]

    Emie-map: Large-scale road surface reconstruction based on explicit mesh and implicit encoding,

    W. Wu, Q. Wang, G. Wang, J. Wang, T. Zhao, Y . Liu, D. Gao, Z. Liu, and H. Wang, “Emie-map: Large-scale road surface reconstruction based on explicit mesh and implicit encoding,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 370–386

  42. [42]

    Rogs: Large scale road surface reconstruction with meshgrid gaussian,

    Z. Feng, W. Wu, T. Deng, and H. Wang, “Rogs: Large scale road surface reconstruction with meshgrid gaussian,”arXiv preprint arXiv:2405.14342, 2024

  43. [43]

    Inverse perspective mapping simplifies optical flow computation and obstacle detection,

    H. A. Mallot, H. H. B ¨ulthoff, J. Little, and S. Bohrer, “Inverse perspective mapping simplifies optical flow computation and obstacle detection,” Biological cybernetics, 1991

  44. [44]

    Pointnet: Deep learning on point sets for 3d classification and segmentation,

    C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 652–660

  45. [45]

    Focal loss for dense object detection,

    T.-Y . Lin, P. Goyal, R. Girshick, K. He, and P. Doll ´ar, “Focal loss for dense object detection,” inProceedings of the IEEE international conference on computer vision, 2017, pp. 2980–2988

  46. [46]

    Semantic Instance Segmentation with a Discriminative Loss Function

    B. De Brabandere, D. Neven, and L. Van Gool, “Semantic instance segmentation with a discriminative loss function,”arXiv preprint arXiv:1708.02551, 2017

  47. [47]

    Argoverse 2: Next generation datasets for self-driving perception and forecasting,

    B. Wilson, W. Qi, T. Agarwal, J. Lambert, J. Singh, S. Khandelwal, B. Pan, R. Kumar, A. Hartnett, J. K. Pontes, D. Ramanan, P. Carr, and J. Hays, “Argoverse 2: Next generation datasets for self-driving perception and forecasting,” inProceedings of the Neural Informa- tion Processing Systems Track on Datasets and Benchmarks (NeurIPS Datasets and Benchmarks...

  48. [48]

    nuscenes: A multimodal dataset for autonomous driving,

    H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 621–11 631

  49. [49]

    Localization is all you evaluate: Data leakage in online mapping datasets and how to fix it,

    A. Lilja, J. Fu, E. Stenborg, and L. Hammarstrand, “Localization is all you evaluate: Data leakage in online mapping datasets and how to fix it,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 22 150–22 159

  50. [50]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778

  51. [51]

    Object detection with discriminatively trained part-based models,

    P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, “Object detection with discriminatively trained part-based models,”IEEE transactions on pattern analysis and machine intelligence, vol. 32, no. 9, pp. 1627–1645, 2009

  52. [52]

    3d gaussian splatting for real-time radiance field rendering

    B. Kerbl, G. Kopanas, T. Leimk ¨uhler, and G. Drettakis, “3d gaussian splatting for real-time radiance field rendering.”ACM Trans. Graph., vol. 42, no. 4, pp. 139–1, 2023