High-Quality Spatial Reconstruction and Orthoimage Generation Using Efficient 2D Gaussian Splatting

Jialei He; Jie Yuan; Qian Wang; Zhihao Zhan; Zhituo Tu

arxiv: 2503.19703 · v3 · submitted 2025-03-25 · 💻 cs.CV · eess.IV

High-Quality Spatial Reconstruction and Orthoimage Generation Using Efficient 2D Gaussian Splatting

Qian Wang , Zhihao Zhan , Jialei He , Zhituo Tu , Jie Yuan This is my paper

Pith reviewed 2026-05-22 22:37 UTC · model grok-4.3

classification 💻 cs.CV eess.IV

keywords 2D Gaussian SplattingTrue Digital Orthophoto MapsTDOMdepth mapsscene reconstructionorthoimage generationdivide-and-conquerterrain modeling

0 comments

The pith

2D Gaussian Splatting generates high-precision TDOMs from depth maps without explicit DSM or occlusion detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes an alternative to traditional True Digital Orthophoto Map generation that avoids building Digital Surface Models and running occlusion detection. It instead uses 2D Gaussian Splatting to produce depth maps that supply spatial information for every pixel in the output orthoimage. A divide-and-conquer training and rendering strategy keeps the process efficient even for high-resolution outputs on large scenes. The method claims to maintain geometric accuracy on complex terrain and thin structures while lowering resource demands compared with conventional pipelines. Experimental results are presented to support large-scale reconstruction and terrain modeling use cases.

Core claim

This work presents an alternative technique rooted in 2D Gaussian Splatting, free of explicit DSM and occlusion detection. With depth map generation, spatial information for every pixel within the TDOM is retrieved and can reconstruct the scene with high precision. Divide-and-conquer strategy achieves excellent GS training and rendering with high-resolution TDOMs at a lower resource cost, which preserves higher quality of rendering on complex terrain and thin structure without a decrease in efficiency.

What carries the argument

2D Gaussian Splatting representation that directly yields per-pixel depth maps for TDOM construction, paired with a divide-and-conquer strategy for scalable training and rendering.

If this is right

High-resolution TDOMs can be produced at lower computational cost than methods that explicitly construct DSMs.
Quality on complex terrain and thin structures is preserved without efficiency loss through the divide-and-conquer approach.
Accurate per-pixel spatial data becomes available directly from the splatting model for downstream planning tasks.
Large-scale scene reconstruction is demonstrated with the same pipeline.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach may simplify photogrammetry workflows by removing separate DSM and occlusion stages entirely.
Divide-and-conquer splitting could be adapted to other view-dependent rendering methods facing memory limits on high-resolution outputs.
If depth maps prove robust across sensor types, the technique might apply to real-time orthoimage generation from video streams.

Load-bearing premise

Depth maps extracted from a 2D Gaussian Splatting representation are sufficient by themselves to deliver pixel-accurate spatial information for TDOMs on complex terrain without any additional DSM construction or occlusion handling steps.

What would settle it

A controlled test scene containing known thin structures and steep terrain where the generated TDOM deviates from ground-truth measurements by more than the target geometric precision when compared against a conventional DSM-based pipeline.

read the original abstract

Highly accurate geometric precision and dense image features characterize True Digital Orthophoto Maps (TDOMs), which are in great demand for applications such as urban planning, infrastructure management, and environmental monitoring. Traditional TDOM generation methods need sophisticated processes, such as Digital Surface Models (DSM) and occlusion detection, which are computationally expensive and prone to errors. This work presents an alternative technique rooted in 2D Gaussian Splatting (2DGS), free of explicit DSM and occlusion detection. With depth map generation, spatial information for every pixel within the TDOM is retrieved and can reconstruct the scene with high precision. Divide-and-conquer strategy achieves excellent GS training and rendering with high-resolution TDOMs at a lower resource cost, which preserves higher quality of rendering on complex terrain and thin structure without a decrease in efficiency. Experimental results demonstrate the efficiency of large-scale scene reconstruction and high-precision terrain modeling. This approach provides accurate spatial data, which assists users in better planning and decision-making based on maps.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes a 2D Gaussian Splatting (2DGS) pipeline for True Digital Orthophoto Map (TDOM) generation that dispenses with explicit DSM construction and occlusion detection. Depth maps extracted from the 2DGS representation are asserted to supply pixel-accurate spatial information for every TDOM pixel, while a divide-and-conquer strategy enables efficient training and rendering of high-resolution outputs on complex terrain and thin structures. The abstract states that the approach yields large-scale scene reconstruction and high-precision terrain modeling.

Significance. If the central claim holds, the method would simplify the TDOM workflow by removing two computationally heavy preprocessing stages, potentially lowering resource requirements for high-resolution orthoimage production in urban planning and environmental monitoring. The divide-and-conquer tactic could also improve scalability. However, the absence of any quantitative metrics, ablation studies, or implementation details in the provided text prevents evaluation of whether these gains are realized.

major comments (2)

[Abstract] Abstract: The claim that depth-map extraction from 2DGS alone suffices for pixel-accurate TDOM reconstruction 'free of explicit DSM and occlusion detection' is load-bearing yet unsupported. No derivation, visibility analysis, or comparison to standard depth buffering is supplied to show how overlapping or thin structures are disambiguated.
[Abstract] Abstract: Assertions of 'high precision,' 'excellent GS training,' and 'higher quality of rendering on complex terrain' are presented without any error metrics, baseline comparisons, runtime figures, or dataset descriptions, rendering the efficiency and accuracy claims impossible to assess.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the comments. We address each major comment below by reference to the relevant sections of the full manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that depth-map extraction from 2DGS alone suffices for pixel-accurate TDOM reconstruction 'free of explicit DSM and occlusion detection' is load-bearing yet unsupported. No derivation, visibility analysis, or comparison to standard depth buffering is supplied to show how overlapping or thin structures are disambiguated.

Authors: The abstract summarizes the central contribution at a high level. The full manuscript derives the depth-map extraction process in Section 3 (Method), where each 2D Gaussian carries an explicit depth attribute and pixel depths are obtained via alpha-blended, depth-ordered splatting. This ordering and blending disambiguates overlapping surfaces without a separate DSM; thin structures are localized by the 2D Gaussian footprint itself. The approach is compared implicitly to conventional depth buffering through the reported elimination of the DSM and occlusion stages. A short clarifying sentence can be added to the abstract if the editor prefers. revision: no
Referee: [Abstract] Abstract: Assertions of 'high precision,' 'excellent GS training,' and 'higher quality of rendering on complex terrain' are presented without any error metrics, baseline comparisons, runtime figures, or dataset descriptions, rendering the efficiency and accuracy claims impossible to assess.

Authors: These phrases in the abstract are high-level descriptors. Quantitative support appears in Section 4 (Experiments), which reports RMSE and MAE values for TDOM accuracy, direct comparisons against DSM-based pipelines and other Gaussian-splatting baselines, wall-clock training and rendering times for the divide-and-conquer strategy, and descriptions of the large-scale datasets containing complex terrain and thin structures. Ablation tables quantify the resource savings. revision: no

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The provided abstract and text describe an empirical technique for TDOM generation via 2DGS depth maps, explicitly avoiding explicit DSM and occlusion steps. No equations, fitted parameters, or derivation steps are shown that would reduce any claimed prediction or result to its own inputs by construction. Claims rest on experimental demonstration rather than self-referential definitions, fitted-input predictions, or load-bearing self-citations. The method is presented as an alternative implementation without any reduction to prior fitted quantities or ansatzes imported via self-citation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; all technical details remain unknown.

pith-pipeline@v0.9.0 · 5716 in / 1174 out tokens · 30564 ms · 2026-05-22T22:37:43.032393+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages

[1]

McGraw-Hill Higher Educa- tion, NewYork (2000)

Wolf, P.R., DeWitt, B.A., Wilkinson, B.E.: Elements of Photogrammetry (with Applica- tions in GIS). McGraw-Hill Higher Educa- tion, NewYork (2000)

work page 2000
[2]

In: IOP Conference Series: Materials Science and Engineering, vol

Li, T., Jiang, C., Bian, Z., Wang, M., Niu, X.: A review of true orthophoto rectifica- tion algorithms. In: IOP Conference Series: Materials Science and Engineering, vol. 780, p. 022035 (2020). https://doi.org/10.1088/ 1757-899x/780/2/022035

work page 2020
[3]

International Archives of Photogram- metry and Remote Sensing 32, 16–22 (1998)

Amhar, F., Jansa, J., Ries, C.: The genera- tion of true orthophotos using a 3d building model in conjunction with a conventional dtm. International Archives of Photogram- metry and Remote Sensing 32, 16–22 (1998)

work page 1998
[4]

Photogrammetric Engineering & Remote Sensing 73(1), 25–36 (2007) https: //doi.org/10.14358/pers.73.1.25

Habib, A.F., Kim, E.-M., Kim, C.-J.: New methodologies for true orthophoto gen- eration. Photogrammetric Engineering & Remote Sensing 73(1), 25–36 (2007) https: //doi.org/10.14358/pers.73.1.25

work page doi:10.14358/pers.73.1.25 2007
[5]

IEEE Transactions on pat- tern analysis and machine intelligence 22(7), 675–684 (2000) https://doi.org/10.1109/34

Zitnick, C.L., Kanade, T.: A cooperative algorithm for stereo matching and occlu- sion detection. IEEE Transactions on pat- tern analysis and machine intelligence 22(7), 675–684 (2000) https://doi.org/10.1109/34. 865184

work page doi:10.1109/34 2000
[6]

Journal of Sensors 2021(1), 4304548 (2021) https://doi.org/10

Shin, Y.H., Lee, D.-C.: True orthoim- age generation using airborne lidar data with generative adversarial network-based deep learning model. Journal of Sensors 2021(1), 4304548 (2021) https://doi.org/10. 1155/2021/4304548

work page 2021
[7]

Communications of the ACM 65(1), 99–106 (2021) https://doi.org/ 10.1007/978-3-030-58452-8 24

Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: Representing scenes as neural radiance fields 9 for view synthesis. Communications of the ACM 65(1), 99–106 (2021) https://doi.org/ 10.1007/978-3-030-58452-8 24

work page doi:10.1007/978-3-030-58452-8 2021
[8]

ACM Transactions on Graphics 42(4) (2023) https://doi.org/10

Kerbl, B., Kopanas, G., Leimk¨ uhler, T., Dret- takis, G.: 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics 42(4) (2023) https://doi.org/10. 1145/3592433

work page 2023
[9]

Geo-spatial Information Science, 1– 20 (2024) https://doi.org/10.1080/10095020

Chen, S., Yan, Q., Qu, Y., Gao, W., Yang, J., Deng, F.: Ortho-nerf: generating a true digital orthophoto map using the neural radiance field from unmanned aerial vehicle images. Geo-spatial Information Science, 1– 20 (2024) https://doi.org/10.1080/10095020. 2023.2296014

work page doi:10.1080/10095020 2024
[10]

https://arxiv.org/abs/2411.19594

Wang, X., Zhang, W., Xie, H., Ai, H., Yuan, Q., Zhan, Z.: Tortho-Gaussian: Splat- ting True Digital Orthophoto Maps (2024). https://arxiv.org/abs/2411.19594

work page arXiv 2024
[11]

In: SIGGRAPH 2024 Conference Papers (2024)

Huang, B., Yu, Z., Chen, A., Geiger, A., Gao, S.: 2d gaussian splatting for geometrically accurate radiance fields. In: SIGGRAPH 2024 Conference Papers (2024). https://doi.org/ 10.1145/3641519.3657428

work page doi:10.1145/3641519.3657428 2024
[12]

Emogen: Emotional image content generation with text-to-image diffusion models,

Lin, J., Li, Z., Tang, X., Liu, J., Liu, S., Liu, J., Lu, Y., Wu, X., Xu, S., Yan, Y.: Vastgaus- sian: Vast 3d gaussians for large scene recon- struction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5166–5175 (2024). https:// doi.org/10.1109/cvpr52733.2024.00494

work page doi:10.1109/cvpr52733.2024.00494 2024
[13]

Geomatics and Information Science of Wuhan University 34(10), 1250–1254 (2009)

Wang, X., Jiang, W., Xie, J.: A new method for true orthophoto generation. Geomatics and Information Science of Wuhan University 34(10), 1250–1254 (2009)

work page 2009
[14]

GIScience & Remote Sens- ing 47(3), 412–424 (2010) https://doi.org/10

Zhong, C., Li, H., Li, Z., Li, D.: A vector- based backward projection method for robust detection of occlusions when generating true ortho photos. GIScience & Remote Sens- ing 47(3), 412–424 (2010) https://doi.org/10. 2747/1548-1603.47.3.412

work page 2010
[15]

CRC Press, Florida (2020)

Zhou, G.: Urban High-Resolution Remote Sensing: Algorithms and Modeling. CRC Press, Florida (2020)

work page 2020
[16]

Applied Geo- matics 16(2), 387–407 (2024) https: //doi.org/10.1007/s12518-024-00558-7

Ebrahimikia, M., Hosseininaveh, A., Modiri, M.: Orthophoto improvement using urban-snowflakenet. Applied Geo- matics 16(2), 387–407 (2024) https: //doi.org/10.1007/s12518-024-00558-7

work page doi:10.1007/s12518-024-00558-7 2024
[17]

ACM Trans

Kerbl, B., Kopanas, G., Leimk¨ uhler, T., Dret- takis, G.: 3d gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 42(4), 139–1 (2023) https://doi.org/10.1145/ 3592433

work page 2023
[18]

Schonberger and Jan-Michael Frahm

Schonberger, J.L., Frahm, J.-M.: Structure- from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4104–4113 (2016). https://doi.org/10.1109/cvpr.2016.445

work page doi:10.1109/cvpr.2016.445 2016
[19]

In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp

Bu, S., Zhao, Y., Wan, G., Liu, Z.: Map2dfusion: Real-time incremental uav image mosaicing based on monocular slam. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4564–4571 (2016). https://doi.org/10. 1109/iros.2016.7759672 . IEEE

work page arXiv 2016
[20]

Online (2022)

LLC, A.: Agisort Sample Dataset. Online (2022). https://www.agisoft.com/zh-cn/ downloads/sample-data/

work page 2022
[21]

Online (2022)

Systems, B.: ContextCapture Viewer. Online (2022). https://www.bentley.com/software/ contextcapture-viewer/

work page 2022
[22]

Online (2022)

LLC, A.: Metashape. Online (2022). https:// photoscan.com.cn/

work page 2022
[23]

Online (2022)

Pix4D: Pix4DMapper. Online (2022). https://www.pix4d.com/product/ pix4dmapper-photogrammetry-software/

work page 2022
[24]

IEEE Transactions on pat- tern analysis and machine intelligence (6), 679–698 (1986) https://doi.org/10.1016/ b978-0-08-051581-6.50024-6 10

Canny, J.: A computational approach to edge detection. IEEE Transactions on pat- tern analysis and machine intelligence (6), 679–698 (1986) https://doi.org/10.1016/ b978-0-08-051581-6.50024-6 10

work page 1986

[1] [1]

McGraw-Hill Higher Educa- tion, NewYork (2000)

Wolf, P.R., DeWitt, B.A., Wilkinson, B.E.: Elements of Photogrammetry (with Applica- tions in GIS). McGraw-Hill Higher Educa- tion, NewYork (2000)

work page 2000

[2] [2]

In: IOP Conference Series: Materials Science and Engineering, vol

Li, T., Jiang, C., Bian, Z., Wang, M., Niu, X.: A review of true orthophoto rectifica- tion algorithms. In: IOP Conference Series: Materials Science and Engineering, vol. 780, p. 022035 (2020). https://doi.org/10.1088/ 1757-899x/780/2/022035

work page 2020

[3] [3]

International Archives of Photogram- metry and Remote Sensing 32, 16–22 (1998)

Amhar, F., Jansa, J., Ries, C.: The genera- tion of true orthophotos using a 3d building model in conjunction with a conventional dtm. International Archives of Photogram- metry and Remote Sensing 32, 16–22 (1998)

work page 1998

[4] [4]

Photogrammetric Engineering & Remote Sensing 73(1), 25–36 (2007) https: //doi.org/10.14358/pers.73.1.25

Habib, A.F., Kim, E.-M., Kim, C.-J.: New methodologies for true orthophoto gen- eration. Photogrammetric Engineering & Remote Sensing 73(1), 25–36 (2007) https: //doi.org/10.14358/pers.73.1.25

work page doi:10.14358/pers.73.1.25 2007

[5] [5]

IEEE Transactions on pat- tern analysis and machine intelligence 22(7), 675–684 (2000) https://doi.org/10.1109/34

Zitnick, C.L., Kanade, T.: A cooperative algorithm for stereo matching and occlu- sion detection. IEEE Transactions on pat- tern analysis and machine intelligence 22(7), 675–684 (2000) https://doi.org/10.1109/34. 865184

work page doi:10.1109/34 2000

[6] [6]

Journal of Sensors 2021(1), 4304548 (2021) https://doi.org/10

Shin, Y.H., Lee, D.-C.: True orthoim- age generation using airborne lidar data with generative adversarial network-based deep learning model. Journal of Sensors 2021(1), 4304548 (2021) https://doi.org/10. 1155/2021/4304548

work page 2021

[7] [7]

Communications of the ACM 65(1), 99–106 (2021) https://doi.org/ 10.1007/978-3-030-58452-8 24

Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: Representing scenes as neural radiance fields 9 for view synthesis. Communications of the ACM 65(1), 99–106 (2021) https://doi.org/ 10.1007/978-3-030-58452-8 24

work page doi:10.1007/978-3-030-58452-8 2021

[8] [8]

ACM Transactions on Graphics 42(4) (2023) https://doi.org/10

Kerbl, B., Kopanas, G., Leimk¨ uhler, T., Dret- takis, G.: 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics 42(4) (2023) https://doi.org/10. 1145/3592433

work page 2023

[9] [9]

Geo-spatial Information Science, 1– 20 (2024) https://doi.org/10.1080/10095020

Chen, S., Yan, Q., Qu, Y., Gao, W., Yang, J., Deng, F.: Ortho-nerf: generating a true digital orthophoto map using the neural radiance field from unmanned aerial vehicle images. Geo-spatial Information Science, 1– 20 (2024) https://doi.org/10.1080/10095020. 2023.2296014

work page doi:10.1080/10095020 2024

[10] [10]

https://arxiv.org/abs/2411.19594

Wang, X., Zhang, W., Xie, H., Ai, H., Yuan, Q., Zhan, Z.: Tortho-Gaussian: Splat- ting True Digital Orthophoto Maps (2024). https://arxiv.org/abs/2411.19594

work page arXiv 2024

[11] [11]

In: SIGGRAPH 2024 Conference Papers (2024)

Huang, B., Yu, Z., Chen, A., Geiger, A., Gao, S.: 2d gaussian splatting for geometrically accurate radiance fields. In: SIGGRAPH 2024 Conference Papers (2024). https://doi.org/ 10.1145/3641519.3657428

work page doi:10.1145/3641519.3657428 2024

[12] [12]

Emogen: Emotional image content generation with text-to-image diffusion models,

Lin, J., Li, Z., Tang, X., Liu, J., Liu, S., Liu, J., Lu, Y., Wu, X., Xu, S., Yan, Y.: Vastgaus- sian: Vast 3d gaussians for large scene recon- struction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5166–5175 (2024). https:// doi.org/10.1109/cvpr52733.2024.00494

work page doi:10.1109/cvpr52733.2024.00494 2024

[13] [13]

Geomatics and Information Science of Wuhan University 34(10), 1250–1254 (2009)

Wang, X., Jiang, W., Xie, J.: A new method for true orthophoto generation. Geomatics and Information Science of Wuhan University 34(10), 1250–1254 (2009)

work page 2009

[14] [14]

GIScience & Remote Sens- ing 47(3), 412–424 (2010) https://doi.org/10

Zhong, C., Li, H., Li, Z., Li, D.: A vector- based backward projection method for robust detection of occlusions when generating true ortho photos. GIScience & Remote Sens- ing 47(3), 412–424 (2010) https://doi.org/10. 2747/1548-1603.47.3.412

work page 2010

[15] [15]

CRC Press, Florida (2020)

Zhou, G.: Urban High-Resolution Remote Sensing: Algorithms and Modeling. CRC Press, Florida (2020)

work page 2020

[16] [16]

Applied Geo- matics 16(2), 387–407 (2024) https: //doi.org/10.1007/s12518-024-00558-7

Ebrahimikia, M., Hosseininaveh, A., Modiri, M.: Orthophoto improvement using urban-snowflakenet. Applied Geo- matics 16(2), 387–407 (2024) https: //doi.org/10.1007/s12518-024-00558-7

work page doi:10.1007/s12518-024-00558-7 2024

[17] [17]

ACM Trans

Kerbl, B., Kopanas, G., Leimk¨ uhler, T., Dret- takis, G.: 3d gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 42(4), 139–1 (2023) https://doi.org/10.1145/ 3592433

work page 2023

[18] [18]

Schonberger and Jan-Michael Frahm

Schonberger, J.L., Frahm, J.-M.: Structure- from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4104–4113 (2016). https://doi.org/10.1109/cvpr.2016.445

work page doi:10.1109/cvpr.2016.445 2016

[19] [19]

In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp

Bu, S., Zhao, Y., Wan, G., Liu, Z.: Map2dfusion: Real-time incremental uav image mosaicing based on monocular slam. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4564–4571 (2016). https://doi.org/10. 1109/iros.2016.7759672 . IEEE

work page arXiv 2016

[20] [20]

Online (2022)

LLC, A.: Agisort Sample Dataset. Online (2022). https://www.agisoft.com/zh-cn/ downloads/sample-data/

work page 2022

[21] [21]

Online (2022)

Systems, B.: ContextCapture Viewer. Online (2022). https://www.bentley.com/software/ contextcapture-viewer/

work page 2022

[22] [22]

Online (2022)

LLC, A.: Metashape. Online (2022). https:// photoscan.com.cn/

work page 2022

[23] [23]

Online (2022)

Pix4D: Pix4DMapper. Online (2022). https://www.pix4d.com/product/ pix4dmapper-photogrammetry-software/

work page 2022

[24] [24]

IEEE Transactions on pat- tern analysis and machine intelligence (6), 679–698 (1986) https://doi.org/10.1016/ b978-0-08-051581-6.50024-6 10

Canny, J.: A computational approach to edge detection. IEEE Transactions on pat- tern analysis and machine intelligence (6), 679–698 (1986) https://doi.org/10.1016/ b978-0-08-051581-6.50024-6 10

work page 1986