pith. machine review for the scientific record. sign in

arxiv: 2604.11401 · v1 · submitted 2026-04-13 · 💻 cs.CV

Recognition: unknown

GS4City: Hierarchical Semantic Gaussian Splatting via City-Model Priors

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:34 UTC · model grok-4.3

classification 💻 cs.CV
keywords 3D Gaussian Splattingsemantic segmentationCityGMLurban reconstructionhierarchical semanticsLoD3 modelsscene understanding
0
0 comments X

The pith

City-model priors enable hierarchical semantic supervision for 3D Gaussian Splatting in urban scenes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that structured city models can supply the hierarchical building semantics that current 2D-image-driven Gaussian Splatting methods lack. It converts detailed city geometry into image-aligned masks through raycasting and relation validation, then fuses those masks with image predictions to assign consistent identities to each Gaussian. This produces scene representations that remain photorealistic while also supporting structured semantic queries at coarse and fine levels. A sympathetic reader would care because urban reconstruction would then respect real building organization rather than relying on ambiguous per-pixel labels alone.

Core claim

GS4City derives reliable image-aligned masks from LoD3 CityGML models via two-pass raycasting, explicitly using parent-child relations to validate and recover fine-grained facade elements. It fuses these geometry-grounded masks with foundation-model predictions to establish scene-consistent instance correspondences, and learns a compact identity encoding for each Gaussian under joint 2D identity supervision and 3D spatial regularization.

What carries the argument

Two-pass raycasting on LoD3 CityGML models that uses parent-child validation to produce image-aligned semantic masks, which are then fused with 2D predictions to supervise identity encodings on Gaussian primitives.

If this is right

  • Gaussian scenes become semantically queryable while preserving photorealistic rendering quality.
  • Building segmentation reaches higher accuracy at both coarse and fine-grained levels by embedding structured priors.
  • The resulting representations respect physical building hierarchies rather than treating semantics as independent per-view labels.
  • Existing city databases can be directly leveraged to regularize neural scene models without additional dense labeling.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same mask-generation pipeline could be adapted to other geospatial priors such as indoor floor plans or terrain models.
  • Large-scale city digitization projects might reduce manual annotation effort by bootstrapping from public city-model repositories.
  • Dynamic updates to Gaussian scenes could be supported if city models are refreshed over time with new construction data.

Load-bearing premise

The provided city models match the captured images closely enough in both geometry and semantics that the generated masks remain reliable without major misalignment or omissions.

What would settle it

Testing the method on a new urban capture where the city models contain known geometric offsets or semantic errors and checking whether the reported gains in segmentation accuracy disappear.

Figures

Figures reproduced from arXiv: 2604.11401 by Benjamin Busam, Boris Jutzi, Jinyu Zhu, Olaf Wysocki, Qilin Zhang.

Figure 1
Figure 1. Figure 1: GS4City augments a photorealistic Gaussian scene with [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of GS4City. Starting from an aligned LoD3 city model, calibrated images, and a pre-trained 3DGS scene, the [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Foundation-model mask filtering. Raw image-based [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative results on representative scenes from TUM2TWIN and Gold Coast at both the coarse and fine-grained semantic [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
read the original abstract

Recent semantic 3D Gaussian Splatting (3DGS) methods primarily rely on 2D foundation models, often yielding ambiguous boundaries and limited support for structured urban semantics. While city models such as CityGML encode hierarchically organized semantics together with building geometry, these labels cannot be directly mapped to Gaussian primitives. We present GS4City, a hierarchical semantic Gaussian Splatting method that incorporates city-model priors for urban scene understanding. GS4City derives reliable image-aligned masks from Level of Detail (LoD) 3 CityGML models via two-pass raycasting, explicitly using parent-child relations to validate and recover fine-grained facade elements. It then fuses these geometry-grounded masks with foundation-model predictions to establish scene-consistent instance correspondences, and learns a compact identity encoding for each Gaussian under joint 2D identity supervision and 3D spatial regularization. Experiments on the TUM2TWIN and Gold Coast datasets show that GS4City effectively incorporates structured building semantics into Gaussian scene representations, outperforming existing 2D-driven semantic 3DGS baselines, including LangSplat and Gaga, by up to 15.8 IoU points in coarse building segmentation and 14.2 mIoU points in fine-grained semantic segmentation. By bridging structured city models and photorealistic Gaussian scene representations, GS4City enables semantically queryable and structure-aware urban reconstruction. Code is available at https://github.com/Jinyzzz/GS4City.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. GS4City proposes a hierarchical semantic 3D Gaussian Splatting method that incorporates priors from LoD3 CityGML city models. It generates image-aligned semantic masks via two-pass raycasting that exploits parent-child relations for facade validation, fuses these with 2D foundation-model predictions to establish consistent correspondences, and optimizes compact identity encodings for each Gaussian under joint 2D identity supervision and 3D spatial regularization. Experiments on the TUM2TWIN and Gold Coast datasets report gains of up to 15.8 IoU in coarse building segmentation and 14.2 mIoU in fine-grained semantic segmentation over 2D-driven baselines such as LangSplat and Gaga.

Significance. If the mask-generation step is shown to be reliable, the work provides a concrete bridge between structured CityGML semantics and photorealistic Gaussian representations, enabling queryable urban reconstructions. The public code release is a clear strength that supports reproducibility and follow-up work.

major comments (2)
  1. [§3.2] §3.2 (Mask Generation): The two-pass raycasting plus parent-child validation is asserted to produce reliable, image-aligned masks that serve as geometry-grounded supervision. However, no quantitative fidelity metrics (pixel IoU against independent manual annotations, mean reprojection error, or occlusion-handling statistics) are reported on either TUM2TWIN or Gold Coast. Because the headline gains rest on these masks being superior to pure 2D foundation-model labels, the absence of such validation is load-bearing for the central claim.
  2. [§4.2, Table 1] §4.2, Table 1: The reported 15.8 IoU and 14.2 mIoU improvements are presented without an ablation that removes the CityGML-derived masks while retaining the identity encoding and 3D regularization. Consequently it remains unclear whether the observed advantage over LangSplat and Gaga stems specifically from the city-model priors or from other implementation choices.
minor comments (2)
  1. The abstract states 'up to 15.8 IoU points' and '14.2 mIoU points' without citing the exact table rows or dataset splits; the main text should make these references explicit for immediate traceability.
  2. [§3.3] Notation for the identity encoding dimension and the precise fusion weights between CityGML masks and foundation-model logits could be stated more formally (e.g., as an equation) to aid readers who wish to re-implement without consulting the code.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive feedback on our manuscript. We address each major comment below and will incorporate the requested validations and ablations in the revised version to strengthen the claims.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (Mask Generation): The two-pass raycasting plus parent-child validation is asserted to produce reliable, image-aligned masks that serve as geometry-grounded supervision. However, no quantitative fidelity metrics (pixel IoU against independent manual annotations, mean reprojection error, or occlusion-handling statistics) are reported on either TUM2TWIN or Gold Coast. Because the headline gains rest on these masks being superior to pure 2D foundation-model labels, the absence of such validation is load-bearing for the central claim.

    Authors: We agree that direct quantitative validation of the generated masks is important to support the central claim. The reported segmentation gains provide indirect evidence, but to address this directly we will add pixel IoU against manually annotated subsets from both datasets, mean reprojection error, and occlusion-handling statistics in a new subsection of §3.2 (or supplementary material) in the revision. revision: yes

  2. Referee: [§4.2, Table 1] §4.2, Table 1: The reported 15.8 IoU and 14.2 mIoU improvements are presented without an ablation that removes the CityGML-derived masks while retaining the identity encoding and 3D regularization. Consequently it remains unclear whether the observed advantage over LangSplat and Gaga stems specifically from the city-model priors or from other implementation choices.

    Authors: We acknowledge that an ablation isolating the CityGML-derived masks is needed to attribute the gains precisely. In the revision we will add this ablation (replacing CityGML masks with pure 2D foundation-model predictions while retaining identity encodings and 3D regularization) to Table 1 and discuss the results in §4.2. revision: yes

Circularity Check

0 steps flagged

No circularity; external priors and empirical gains are independent of fitted inputs

full rationale

The paper's derivation chain starts from external LoD3 CityGML models and off-the-shelf foundation models, derives image-aligned masks via two-pass raycasting plus parent-child validation, fuses them to create supervision, and optimizes Gaussian identity encodings under 2D/3D losses. The reported IoU/mIoU improvements on TUM2TWIN and Gold Coast are measured outcomes against baselines (LangSplat, Gaga), not quantities that reduce by construction to any parameter fitted inside the paper. No self-definitional equations, no fitted-input-called-prediction, and no load-bearing self-citations appear in the provided text. The method is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The method depends on the availability and accuracy of external LoD3 CityGML models and on the correctness of the two-pass raycasting procedure; no explicit free parameters or invented entities are described in the abstract.

pith-pipeline@v0.9.0 · 5575 in / 1144 out tokens · 25752 ms · 2026-05-10T15:34:59.682801+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 6 canonical work pages · 2 internal anchors

  1. [1]

    Level of detail in 3d city models

    Filip Biljecki. Level of detail in 3d city models. 2017. 1, 2

  2. [2]

    Formalisation of the level of detail in 3d city mod- elling.Computers, environment and urban systems, 48:1–15,

    Filip Biljecki, Hugo Ledoux, Jantien Stoter, and Junqiao Zhao. Formalisation of the level of detail in 3d city mod- elling.Computers, environment and urban systems, 48:1–15,

  3. [3]

    Applications of 3d city models: State of the art review.ISPRS International Journal of Geo- Information, 4(4):2842–2889, 2015

    Filip Biljecki, Jantien Stoter, Hugo Ledoux, Sisi Zlatanova, and Arzu C ¸¨oltekin. Applications of 3d city models: State of the art review.ISPRS International Journal of Geo- Information, 4(4):2842–2889, 2015. 2

  4. [4]

    Segment any 3d gaussians

    Jiazhong Cen, Jiemin Fang, Chen Yang, Lingxi Xie, Xi- aopeng Zhang, Wei Shen, and Qi Tian. Segment any 3d gaussians. InProceedings of the AAAI Conference on Ar- tificial Intelligence, pages 1971–1979, 2025. 2

  5. [5]

    Lifting by gaussians: A simple, fast and flexible method for 3d instance segmentation

    Rohan Chacko, Nicolai H ¨ani, Eldar Khaliullin, Lin Sun, and Douglas Lee. Lifting by gaussians: A simple, fast and flexible method for 3d instance segmentation. In2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 3497–3507. IEEE, 2025. 2

  6. [6]

    A Survey on 3D Gaussian Splatting

    Guikun Chen and Wenguan Wang. A survey on 3d gaus- sian splatting. arxiv 2024.arXiv preprint arXiv:2401.03890,

  7. [7]

    Depth-regularized optimization for 3d gaussian splatting in few-shot images

    Jaeyoung Chung, Jeongtaek Oh, and Kyoung Mu Lee. Depth-regularized optimization for 3d gaussian splatting in few-shot images. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 811–820, 2024. 2

  8. [8]

    3d gaussian splatting as new era: A survey.IEEE Transactions on Visualization and Computer Graphics, 2024

    Ben Fei, Jingyi Xu, Rui Zhang, Qingyuan Zhou, Weidong Yang, and Ying He. 3d gaussian splatting as new era: A survey.IEEE Transactions on Visualization and Computer Graphics, 2024. 2

  9. [9]

    Ogc city geography markup language (citygml) encoding standard

    Gerhard Gr ¨oger, Thomas H Kolbe, Claus Nagel, and Karl- Heinz H ¨afele. Ogc city geography markup language (citygml) encoding standard. 2012. 1, 2

  10. [10]

    Semantic gaussians: Open-vocabulary scene understanding with 3d gaussian splatting.IEEE Transactions on Circuits and Systems for Video Technology, 2026

    Jun Guo, Xiaojian Ma, Yue Fan, Huaping Liu, and Qing Li. Semantic gaussians: Open-vocabulary scene understanding with 3d gaussian splatting.IEEE Transactions on Circuits and Systems for Video Technology, 2026. 2

  11. [11]

    A Survey on 3D Gaussian Splatting Applications: Segmentation, Editing, and Generation

    Shuting He, Peilin Ji, Yitong Yang, Changshuo Wang, Ji- ayi Ji, Yinglin Wang, and Henghui Ding. A survey on 3d gaussian splatting applications: Segmentation, editing, and generation.arXiv preprint arXiv:2508.09977, 2025. 2

  12. [12]

    Debao Huang, Hanyang Liu, Ningli Xu, and Rongjun Qin. Dynamic urban scene modeling with 3d gaussian splatting from uav full motion videos.The International Archives of the Photogrammetry, Remote Sensing and Spatial Informa- tion Sciences, 48:649–656, 2025. 2

  13. [13]

    3d gaussian splatting for real-time radiance field rendering.ACM Trans

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139–1,

  14. [14]

    Garfield: Group anything with radiance fields

    Chung Min Kim, Mingxuan Wu, Justin Kerr, Ken Gold- berg, Matthew Tancik, and Angjoo Kanazawa. Garfield: Group anything with radiance fields. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21530–21539, 2024. 2

  15. [15]

    Gt-lod3: Lod3 semantic 3d building reconstruction benchmark dataset (accepted).The International Annals of the Photogrammetry, Remote Sens- ing and Spatial Information Sciences

    Han Sae Kim, Olaf Wysocki, Ludwig Hoegner, Jinha Jung, Ksenia Bittner, Joshua Carpenter, Friedrich Fraundorfer, Arpan Kusari, Max Mehltretter, Franz Rottensteiner, Anna Schadl, and Martin Weinmann. Gt-lod3: Lod3 semantic 3d building reconstruction benchmark dataset (accepted).The International Annals of the Photogrammetry, Remote Sens- ing and Spatial Inf...

  16. [16]

    Segment anything

    Alexander Kirillov, Eric Mintun, and Nikhila Ravi. Segment anything. InProceedings of the IEEE/CVF International Conf. on Computer Vision, pages 4015–4026, 2023. 1, 4

  17. [17]

    Cityjson: A com- pact and easy-to-use encoding of the citygml data model

    Hugo Ledoux, Ken Arroyo Ohori, Kavisha Kumar, Bal ´azs Dukai, Anna Labetski, and Stelios Vitalis. Cityjson: A com- pact and easy-to-use encoding of the citygml data model. Open Geospatial Data, Software and Standards, 4(1):1–12,

  18. [18]

    org/abs/2506.09565

    Qijing Li, Jingxiang Sun, Liang An, Zhaoqi Su, Hongwen Zhang, and Yebin Liu. Semanticsplat: Feed-forward 3d scene understanding with language-aware gaussian fields. arXiv preprint arXiv:2506.09565, 2025. 2

  19. [19]

    arXiv preprint arXiv:2507.07136 (2025) 3 Contrastive Language-Colored Pointmap Pretraining 17

    Wanhua Li, Yujie Zhao, Minghan Qin, Yang Liu, Yuanhao Cai, Chuang Gan, and Hanspeter Pfister. Langsplatv2: High- dimensional 3d language gaussian splatting with 450+ fps. arXiv preprint arXiv:2507.07136, 2025. 2

  20. [20]

    Ulsr-gs: Urban large- scale surface reconstruction gaussian splatting with multi- view geometric consistency.ISPRS Journal of Photogram- metry and Remote Sensing, 230:861–880, 2025

    Zhuoxiao Li, Shanliang Yao, Taoyu Wu, Yong Yue, Wu- fan Zhao, Rongjun Qin, ´Angel F Garc´ıa-Fern´andez, Andrew Levers, Jason Ralph, and Xiaohui Zhu. Ulsr-gs: Urban large- scale surface reconstruction gaussian splatting with multi- view geometric consistency.ISPRS Journal of Photogram- metry and Remote Sensing, 230:861–880, 2025. 2

  21. [21]

    Vastgaussian: Vast 3d gaussians for large scene reconstruction

    Jiaqi Lin, Zhihao Li, Xiao Tang, Jianzhuang Liu, Shiyong Liu, Jiayue Liu, Yangdi Lu, Xiaofei Wu, Songcen Xu, You- liang Yan, et al. Vastgaussian: Vast 3d gaussians for large scene reconstruction. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 5166–5175, 2024. 1, 2

  22. [22]

    Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

    Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jing Yang, Chuang Li, Jianwei Yang, Hang Su, Jun Zhu, and Lei Zhang. Grounding dino: Marrying dino with grounded pre-training for open-set object detection.arXiv preprint arXiv:2303.05499, 2023. 1, 6

  23. [23]

    Citygo: Lightweight urban modeling and rendering with proxy buildings and residual gaussians

    Weihang Liu, Yuhui Zhong, Yuke Li, Xi Chen, Jiadi Cui, Honglong Zhang, Lan Xu, Xin Lou, Yujiao Shi, Jingyi Yu, et al. Citygo: Lightweight urban modeling and rendering with proxy buildings and residual gaussians. InProceedings of the SIGGRAPH Asia 2025 Conference Papers, pages 1– 10, 2025. 2

  24. [24]

    Citygaussian: Real-time high-quality large-scale scene rendering with gaussians

    Yang Liu, Chuanchen Luo, Lue Fan, Naiyan Wang, Jun- ran Peng, and Zhaoxiang Zhang. Citygaussian: Real-time high-quality large-scale scene rendering with gaussians. In European Conference on Computer Vision, pages 265–282. Springer, 2024. 1, 2

  25. [25]

    Gaga: Group any gaussians via 3d-aware memory bank.arXiv preprint arXiv:2404.07977, 2024

    Weijie Lyu, Xueting Li, Abhijit Kundu, Yi-Hsuan Tsai, and Ming-Hsuan Yang. Gaga: Group any gaussians via 3d-aware memory bank.arXiv preprint arXiv:2404.07977, 2024. 1, 2, 5, 6

  26. [26]

    Opensplat3d: Open-vocabulary 3d instance segmentation us- ing gaussian splatting

    Jens Piekenbrinck, Christian Schmidt, Alexander Hermans, Narunas Vaskevicius, Timm Linder, and Bastian Leibe. Opensplat3d: Open-vocabulary 3d instance segmentation us- ing gaussian splatting. InProceedings of the Computer Vi- sion and Pattern Recognition Conference, pages 5246–5255,

  27. [27]

    Langsplat: 3d language gaussian splat- ting

    Minghan Qin et al. Langsplat: 3d language gaussian splat- ting. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 20051–20060,

  28. [28]

    Learning transferable visual models from natural language supervi- sion

    Alec Radford, JongWook Kim, and Chris Hallacy. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021. 1, 2, 4, 5

  29. [29]

    Language embedded 3d gaussians for open- vocabulary scene understanding

    Jin-Chuan Shi, Miao Wang, Hao-Bin Duan, and Shao- Hua Guan. Language embedded 3d gaussians for open- vocabulary scene understanding. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5333–5343, 2024. 1, 2

  30. [30]

    Tex- ture2lod3: Enabling lod3 building reconstruction with panoramic images

    Wenzhao Tang, Weihang Li, Xiucheng Liang, Olaf Wysocki, Filip Biljecki, Christoph Holst, and Boris Jutzi. Tex- ture2lod3: Enabling lod3 building reconstruction with panoramic images. InProceedings of the Computer Vi- sion and Pattern Recognition Conference, pages 2016–2026,

  31. [31]

    Hgs- mapping: Online dense mapping using hybrid gaussian rep- resentation in urban scenes.IEEE Robotics and Automation Letters, 9(11):9573–9580, 2024

    Ke Wu, Kaizhao Zhang, Zhiwei Zhang, Muer Tie, Shanshuai Yuan, Jieru Zhao, Zhongxue Gan, and Wenchao Ding. Hgs- mapping: Online dense mapping using hybrid gaussian rep- resentation in urban scenes.IEEE Robotics and Automation Letters, 9(11):9573–9580, 2024. 2

  32. [32]

    Opengaussian: Towards point-level 3d gaussian-based open vocabulary understanding.Advances in Neural Information Processing Systems, 37:19114–19138,

    Yanmin Wu et al. Opengaussian: Towards point-level 3d gaussian-based open vocabulary understanding.Advances in Neural Information Processing Systems, 37:19114–19138,

  33. [33]

    Olaf Wysocki, Benedikt Schwab, Christof Beil, Christoph Holst, and Thomas H Kolbe. Reviewing open data seman- tic 3d city models to develop novel 3d reconstruction meth- ods.The International Archives of the Photogrammetry, Re- mote Sensing and Spatial Information Sciences, 48:493–500,

  34. [34]

    Tum2twin: Introducing the large-scale multimodal urban digital twin benchmark dataset.ISPRS Journal of Pho- togrammetry and Remote Sensing, 232:810–830, 2026

    Olaf Wysocki, Benedikt Schwab, Manoj Kumar Biswanath, Michael Greza, Qilin Zhang, Jingwei Zhu, Thomas Froech, Medhini Heeramaglore, Ihab Hijazi, Khaoula Kanna, et al. Tum2twin: Introducing the large-scale multimodal urban digital twin benchmark dataset.ISPRS Journal of Pho- togrammetry and Remote Sensing, 232:810–830, 2026. 2, 6

  35. [35]

    Scan2lod3: Reconstructing semantic 3d building models at lod3 using ray casting and bayesian networks

    Olaf Wysocki et al. Scan2lod3: Reconstructing semantic 3d building models at lod3 using ray casting and bayesian networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6548– 6558, 2023. 1

  36. [36]

    ZAHA: Introducing the level of facade generalization and the large-scale point cloud facade seman- tic segmentation benchmark dataset

    Olaf Wysocki et al. ZAHA: Introducing the level of facade generalization and the large-scale point cloud facade seman- tic segmentation benchmark dataset. In2025 IEEE/CVF Winter Conf. on Applications of Computer Vision, pages 7648–7658. IEEE, 2025. 6

  37. [37]

    Street gaussians: Modeling dynamic urban scenes with gaussian splatting

    Yunzhi Yan, Haotong Lin, Chenxu Zhou, Weijie Wang, Haiyang Sun, Kun Zhan, Xianpeng Lang, Xiaowei Zhou, and Sida Peng. Street gaussians: Modeling dynamic urban scenes with gaussian splatting. InEuropean Conference on Computer Vision, pages 156–173. Springer, 2024. 1, 2

  38. [38]

    Gaussian grouping: Segment and edit anything in 3d scenes

    Mingqiao Ye, Martin Danelljan, Fisher Yu, and Lei Ke. Gaussian grouping: Segment and edit anything in 3d scenes. InEuropean conference on computer vision, pages 162–179. Springer, 2024. 1, 2

  39. [39]

    Qilin Zhang, Olaf Wysocki, Steffen Urban, and Boris Jutzi. Cdgs: Confidence-aware depth regularization for 3d gaus- sian splatting.ISPRS-International Archives of the Pho- togrammetry, Remote Sensing and Spatial Information Sci- ences, 48:189–196, 2024. 2

  40. [40]

    Gs4buildings: Prior-guided gaussian splatting for 3d building reconstruc- tion.ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 10:249–256, 2025

    Qilin Zhang, Olaf Wysocki, and Boris Jutzi. Gs4buildings: Prior-guided gaussian splatting for 3d building reconstruc- tion.ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 10:249–256, 2025. 2

  41. [41]

    Occfac ¸ade: enabling pre- cise building fac ¸ade parsing in large urban scenes with oc- clusion.International Journal of Remote Sensing, 45(18): 6651–6674, 2024

    Yongjun Zhang, Dongdong Yue, Xinyi Liu, Siyuan Zou, Weiwei Fan, and Zihang Liu. Occfac ¸ade: enabling pre- cise building fac ¸ade parsing in large urban scenes with oc- clusion.International Journal of Remote Sensing, 45(18): 6651–6674, 2024. 1

  42. [42]

    Hugs: Holistic urban 3d scene understanding via gaus- sian splatting

    Hongyu Zhou, Jiahao Shao, Lu Xu, Dongfeng Bai, Weichao Qiu, Bingbing Liu, Yue Wang, Andreas Geiger, and Yiyi Liao. Hugs: Holistic urban 3d scene understanding via gaus- sian splatting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21336– 21345, 2024. 1

  43. [43]

    Feature 3dgs: Supercharging 3d gaussian splatting to enable distilled feature fields

    Shijie Zhou, Haoran Chang, Sicheng Jiang, Zhiwen Fan, Ze- hao Zhu, Dejia Xu, Pradyumna Chari, Suya You, Zhangyang Wang, and Achuta Kadambi. Feature 3dgs: Supercharging 3d gaussian splatting to enable distilled feature fields. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21676–21685, 2024. 1, 2