DL-SLAM: Enabling High-Fidelity Gaussian Splatting SLAM in Dynamic Environments based on Dual-Level Probability

Chen Chen; Jianwei Niu; Qingfeng Li; Xuefeng Liu; Ziheng Xu

arxiv: 2607.01860 · v1 · pith:WJU7LSTSnew · submitted 2026-07-02 · 💻 cs.RO · cs.CV

DL-SLAM: Enabling High-Fidelity Gaussian Splatting SLAM in Dynamic Environments based on Dual-Level Probability

Ziheng Xu , Qingfeng Li , Xuefeng Liu , Chen Chen , Jianwei Niu This is my paper

Pith reviewed 2026-07-03 12:10 UTC · model grok-4.3

classification 💻 cs.RO cs.CV

keywords Gaussian SplattingDynamic SLAMDual-Level ProbabilitySemantic MappingPose EstimationMonocular SLAMObject-Level Pruning

0 comments

The pith

DL-SLAM lifts pixel semantic-geometric probabilities to object level for pruning dynamic Gaussians, yielding up to 13% better tracking and artifact-free static maps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces DL-SLAM, a monocular 3D Gaussian Splatting SLAM system for dynamic scenes. It builds a dual-level probabilistic framework that first fuses semantic and geometric cues into pixel-wise dynamic probability maps, then aggregates those probabilities to the object level for each detected instance. Object-level scores drive categorical pruning of dynamic Gaussians, keeping only reliably static structure in the map while still allowing transiently static objects to supply geometric constraints during pose estimation. The resulting clean static map feeds back to refine the original pixel probabilities, closing a consistency loop. A reader cares because the method avoids both the data loss of hard dynamic masking and the map pollution that occurs when uncertainty maps alone are used.

Core claim

DL-SLAM computes dynamic probability maps by combining semantic and geometric information. These pixel-level probabilities are lifted to 3D and aggregated to derive an object-level dynamic probability for each instance. Object-level probability enables the categorical pruning of dynamic Gaussians, resulting in an artifact-free static map. The static map, in turn, provides a geometrically consistent guidance to refine the pixel-wise probabilities, enhancing their reliability.

What carries the argument

Dual-level probabilistic framework that aggregates pixel-wise semantic-geometric probabilities into per-instance object-level dynamic probabilities for categorical Gaussian pruning.

If this is right

Tracking accuracy improves by up to 13% over prior Gaussian SLAM methods in dynamic environments.
Transiently static objects contribute geometric constraints without leaving persistent artifacts in the final map.
Semantic information resolves boundary ambiguity that pure geometric uncertainty maps cannot handle.
Iterative refinement between the static map and pixel probabilities increases overall map fidelity.
The resulting maps carry explicit semantic labels while remaining dense and geometrically consistent.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same lifting-and-pruning pattern could be applied to other dense representations such as neural radiance fields or voxel grids in dynamic settings.
Object-level probabilities may reduce dependence on hand-labeled dynamic class lists by learning instance motion statistics on the fly.
Feedback from the pruned map to pixel probabilities suggests a route toward self-supervised improvement of semantic segmentation within the SLAM loop.

Load-bearing premise

Aggregating pixel-wise semantic-geometric probabilities to the object level will correctly distinguish dynamic instances from transiently static ones without false inclusions that pollute the map or false exclusions that weaken pose constraints.

What would settle it

A sequence of frames containing objects that move briefly then stop, where the object-level aggregation either includes moving Gaussians in the static map or removes useful static structure, producing measurable increases in absolute trajectory error or visible reconstruction artifacts.

Figures

Figures reproduced from arXiv: 2607.01860 by Chen Chen, Jianwei Niu, Qingfeng Li, Xuefeng Liu, Ziheng Xu.

**Figure 1.** Figure 1: DL-SLAM. Given a monocular video sequence captured in dynamic environments, our method estimates accurate [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: System Overview. DL-SLAM takes RGB images as input to estimate camera poses and reconstruct a static 3D map. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Effect of Semantic Label Refinement. Our method [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 5.** Figure 5: Interactive scene editing on Wild-SLAM iPhone. GT Bonn_crowd2: frame_idx_316 /o object [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

read the original abstract

Recent advances in 3D Gaussian Splatting (3DGS) have enabled significant progress in dense dynamic Simultaneous Localization And Mapping (SLAM). Prevailing methods typically discard predefined dynamic objects, ignoring that transiently static objects offer valuable geometric constraints for pose estimation. A recent work attempts to leverage this potential by employing per-pixel uncertainty maps to quantify the magnitude of motion. While this approach enables transiently static objects to enhance pose estimation, it erroneously integrates these objects into the static map, resulting in persistent artifacts. Moreover, its reliance on purely geometric information leads to ambiguous object boundaries in the uncertainty maps. To overcome these limitations, we present DL-SLAM, a monocular Gaussian Splatting SLAM system built upon a novel dual-level probabilistic framework. Our method computes dynamic probability maps by combining semantic and geometric information. These pixel-level probabilities are lifted to 3D and aggregated to derive an object-level dynamic probability for each instance. Object-level probability enables the categorical pruning of dynamic Gaussians, resulting in an artifact-free static map. The static map, in turn, provides a geometrically consistent guidance to refine the pixel-wise probabilities, enhancing their reliability. Experimental results demonstrate that DL-SLAM outperforms existing approaches, improving tracking accuracy by up to 13\% while generating high-fidelity semantic maps.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DL-SLAM's object-level aggregation of semantic-geometric probabilities is a direct attempt to fix artifact issues in prior per-pixel 3DGS dynamic SLAM, but the abstract gives no concrete experimental backing for the claimed 13% gains or robustness.

read the letter

The main point on this paper is that DL-SLAM adds an object-level dynamic probability step on top of pixel-wise semantic and geometric cues, then uses that to prune Gaussians categorically while feeding the resulting static map back to clean up the pixel probabilities. This targets the specific problem that recent per-pixel uncertainty methods still bake transiently static objects into the map as artifacts.

What stands out as new is the explicit lift from pixels to instances for pruning, plus the closed loop where the cleaned map refines the lower-level probabilities. The abstract correctly notes that purely geometric uncertainty leaves boundary ambiguity, so adding semantics is a reasonable move. The framing around keeping useful geometric constraints from objects that are only temporarily moving is practical for real robotics settings.

The soft spots are mostly around validation. The abstract states up to 13% tracking improvement and artifact-free maps, yet supplies no dataset names, baseline comparisons, ablation on the aggregation operator, or error analysis. Without those, it is impossible to judge whether the object-level step actually avoids the false inclusions or exclusions that segmentation noise and partial views would produce in monocular video. The stress-test concern about aggregation reliability under noisy masks and depth errors looks like it lands; nothing in the provided description shows independent checks on that step.

This work is aimed at people already building or extending 3D Gaussian SLAM systems who need to handle everyday dynamic scenes. If the full paper contains reproducible experiments, code, and clear ablations on the dual-level component, it would be worth a referee's time. Otherwise the central claim stays hard to evaluate.

I would send it for peer review to get the experimental details checked, but would not cite it yet on the strength of the abstract alone.

Referee Report

1 major / 1 minor

Summary. The paper proposes DL-SLAM, a monocular 3D Gaussian Splatting SLAM system for dynamic environments. It introduces a dual-level probabilistic framework that computes pixel-level dynamic probabilities by combining semantic and geometric cues, lifts these to 3D, aggregates them per instance to obtain object-level dynamic probabilities, and uses these for categorical pruning of dynamic Gaussians to produce an artifact-free static map. The static map then provides feedback to refine the pixel-level probabilities. Experiments are claimed to show up to 13% improvement in tracking accuracy over existing methods along with high-fidelity semantic maps.

Significance. If the object-level aggregation step reliably separates dynamic instances from transiently static ones without introducing false inclusions or exclusions, the dual-level framework would represent a meaningful advance in dense dynamic SLAM by enabling the use of geometric constraints from semi-static objects while avoiding persistent artifacts in 3DGS maps. The bidirectional refinement between map and probabilities is a constructive design element.

major comments (1)

[Abstract] Abstract (and method description): The central claim that lifting pixel-wise semantic-geometric probabilities to 3D and aggregating per instance produces a reliable object-level dynamic probability for categorical pruning is load-bearing for both the artifact-free map and the reported tracking gains, yet the manuscript provides no robustness analysis or verification of this aggregation operator against noisy instance segmentation, partial observations, or depth errors.

minor comments (1)

[Abstract] The abstract states performance gains of 'up to 13%' without specifying the exact baselines, datasets, or error metrics used for this figure.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed and constructive review. The comment on the robustness of the object-level aggregation step is well-taken and identifies a genuine gap in the current manuscript. We address it directly below and will revise the paper accordingly.

read point-by-point responses

Referee: [Abstract] Abstract (and method description): The central claim that lifting pixel-wise semantic-geometric probabilities to 3D and aggregating per instance produces a reliable object-level dynamic probability for categorical pruning is load-bearing for both the artifact-free map and the reported tracking gains, yet the manuscript provides no robustness analysis or verification of this aggregation operator against noisy instance segmentation, partial observations, or depth errors.

Authors: We agree that the manuscript currently lacks an explicit robustness analysis of the aggregation operator. While the experimental results on multiple dynamic sequences demonstrate consistent improvements in tracking accuracy and map quality, these do not isolate the effects of noisy instance segmentation, partial observations, or depth errors on the per-instance aggregation step. In the revised manuscript we will add a dedicated robustness subsection (including both quantitative metrics and qualitative examples) that perturbs instance masks, simulates partial views, and injects depth noise to measure the stability of the resulting object-level probabilities and the downstream categorical pruning. This analysis will be referenced from both the abstract and the method description. revision: yes

Circularity Check

0 steps flagged

No circularity detected; derivation remains self-contained

full rationale

The provided abstract and description outline a dual-level probabilistic framework that computes pixel-wise dynamic probabilities from external semantic and geometric inputs, lifts them to 3D, aggregates per instance to obtain object-level probabilities for pruning, and uses the resulting static map for refinement feedback. No equations, parameter fits, predictions, or self-citations are shown that reduce any claim to its own inputs by construction. The central steps are presented as computed from independent cues rather than defined in terms of the outputs they produce, making the derivation self-contained against external benchmarks with no load-bearing circular reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the dual-level probability mechanism is described at a high level without mathematical formulation or fitting details.

pith-pipeline@v0.9.1-grok · 5774 in / 1099 out tokens · 44584 ms · 2026-07-03T12:10:57.386393+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 2 canonical work pages

[1]

Fácil, Javier Civera, and José Neira

Berta Bescós, José M. Fácil, Javier Civera, and José Neira. 2018. DynaSLAM: Tracking, Mapping, and Inpainting in Dynamic Scenes.IEEE Robotics Autom. Lett.3, 4 (2018), 4076–4083

2018
[2]

Yingchun Fan, Qichi Zhang, Yuliang Tang, Shaofeng Liu, and Hong Han. 2022. Blitz-SLAM: A semantic SLAM in dynamic environments.Pattern Recognit.121 (2022), 108225

2022
[3]

Seongbo Ha, Jiung Yeon, and Hyeonwoo Yu. 2024. RGBD GS-ICP SLAM. In Computer Vision - ECCV 2024 - 18th European Conference, Vol. 15094. Springer, 180–197

2024
[4]

Mu Hu, Wei Yin, Chi Zhang, Zhipeng Cai, Xiaoxiao Long, Hao Chen, Kaixuan Wang, Gang Yu, Chunhua Shen, and Shaojie Shen. 2024. Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-Shot Metric Depth and Surface Normal Estimation.IEEE Trans. Pattern Anal. Mach. Intell.46, 12 (2024), 10579– 10596

2024
[5]

Xinggang Hu, Yunzhou Zhang, Zhenzhong Cao, Rong Ma, Yanmin Wu, Zhiqiang Deng, and Wenkai Sun. 2022. CFP-SLAM: A Real-time Visual SLAM Based on Coarse-to-Fine Probability in Dynamic Environments. InIEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 4399–4406

2022
[6]

Yiming Ji, Yang Liu, Guanghu Xie, Boyu Ma, and Zongwu Xie an Hong Liu. 2024. NEDS-SLAM: A Neural Explicit Dense Semantic SLAM Framework Using 3D Gaussian Splatting.IEEE Robotics Autom. Lett.9, 10 (2024), 8778–8785

2024
[7]

Haochen Jiang, Yueming Xu, Kejie Li, Jianfeng Feng, and Li Zhang. 2024. RoDyn- SLAM: Robust Dynamic Dense RGB-D SLAM With Neural Radiance Fields.IEEE Robotics Autom. Lett.9, 9 (2024), 7509–7516

2024
[8]

Scherer, Deva Ramanan, and Jonathon Luiten

Nikhil Varma Keetha, Jay Karhade, Krishna Murthy Jatavallabhula, Gengshan Yang, Sebastian A. Scherer, Deva Ramanan, and Jonathon Luiten. 2024. SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM. InIEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 21357–21366

2024
[9]

Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis
[10]

Graph.42, 4 (2023), 139:1–139:14

3D Gaussian Splatting for Real-Time Radiance Field Rendering.ACM Trans. Graph.42, 4 (2023), 139:1–139:14

2023
[11]

Linfei Li, Lin Zhang, Zhong Wang, and Ying Shen. 2024. GS 3LAM: Gaussian Semantic Splatting SLAM. InProceedings of the 32nd ACM International Conference on Multimedia. ACM, 3019–3027

2024
[12]

Mingrui Li, Dong Li, Sijia Hu, Kangxu Wang, Zhenjun Zhao, and Hongyu Wang
[13]

InProceedings of the 33rd ACM International Conference on Multimedia

SLAM-X:Generalizable Dynamic Removal for NeRF and Gaussian Splatting SLAM. InProceedings of the 33rd ACM International Conference on Multimedia. ACM, 1132–1140
[14]

Mingrui Li, Shuhong Liu, Heng Zhou, Guohao Zhu, Na Cheng, Tianchen Deng, and Hongyu Wang. 2024. SGS-SLAM: Semantic Gaussian Splatting for Neu- ral Dense SLAM. InComputer Vision - ECCV 2024 - 18th European Conference. Springer, 163–179

2024
[15]

Haosong Liu, Long Wang, Haiyong Luo, Fang Zhao, Runze Chen, Yushi Chen, Mingyu Xiao, Jiaquan Yan, and Dan Luo. 2025. SDD-SLAM: Semantic-Driven Dynamic SLAM With Gaussian Splatting.IEEE Robotics Autom. Lett.10, 6 (2025), 5721–5728

2025
[16]

Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Qing Jiang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, and Lei Zhang. 2024. Grounding DINO: Marrying DINO with Grounded Pre-training for Open-Set Object Detection. InComputer Vision - ECCV 2024 - 18th European Conference, Vol. 15105. Springer, 38–55

2024
[17]

Hidenobu Matsuki, Riku Murai, Paul H. J. Kelly, and Andrew J. Davison. 2024. Gaussian Splatting SLAM. InIEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 18039–18048

2024
[18]

Srinivasan, Matthew Tancik, Jonathan T

Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. 2022. NeRF: representing scenes as neural radiance fields for view synthesis.Commun. ACM65, 1 (2022), 99–106

2022
[19]

Raul Mur-Artal and Juan D. Tardós. 2017. ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras.IEEE Trans. Robotics33, 5 (2017), 1255–1262

2017
[20]

Emanuele Palazzolo, Jens Behley, Philipp Lottes, Philippe Giguère, and Cyrill Stachniss. 2019. ReFusion: 3D Reconstruction in Dynamic Environments for RGB-D Cameras Exploiting Residuals. InIEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 7855–7862

2019
[21]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. InProceedings of the 38th International Conference on Machine Learning, Vol. 139. PMLR, 8748–8763

2021
[22]

Nicolas Schischka, Hannah Schieber, Mert Asim Karaoglu, Melih Görgülü, Flo- rian Grötzner, Alexander Ladikos, Nassir Navab, Daniel Roth, and Benjamin Busam. 2025. DynaMoN: Motion-Aware Fast and Robust Camera Localization for Dynamic Neural Radiance Fields.IEEE Robotics Autom. Lett.10, 1 (2025), 548–555

2025
[23]

Jürgen Sturm, Nikolas Engelhard, Felix Endres, Wolfram Burgard, and Daniel Cremers. 2012. A benchmark for the evaluation of RGB-D SLAM systems. In IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 573– 580

2012
[24]

Zachary Teed and Jia Deng. 2021. DROID-SLAM: Deep Visual SLAM for Monocu- lar, Stereo, and RGB-D Cameras. InAnnual Conference on Neural Information Pro- cessing Systems 2021, Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (Eds.). 16558–16569

2021
[25]

Hengyi Wang, Jingwen Wang, and Lourdes Agapito. 2023. Co-SLAM: Joint Coordinate and Sparse Parametric Encodings for Neural Real-Time SLAM. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 13293– 13302

2023
[26]

Haitao Wang, Sijia Wen, and Bo Guo. 2025. Polarimetric Monocular Gaussian Splatting SLAM for Dense Surface Reconstruction. InProceedings of the 33rd ACM International Conference on Multimedia. ACM, 7519–7528

2025
[27]

Bovik, Hamid R

Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity.IEEE Trans. Image Process.13, 4 (2004), 600–612

2004
[28]

Yueming Xu, Haochen Jiang, Zhongyang Xiao, Jianfeng Feng, and Li Zhang
[29]

InAdvances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024

DG-SLAM: Robust Dynamic Gaussian Splatting SLAM with Hybrid Pose Optimization. InAdvances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024

2024
[30]

Ziheng Xu, Jianwei Niu, Qingfeng Li, Tao Ren, and Chen Chen. 2024. NID-SLAM: Neural Implicit Representation-based RGB-D SLAM In Dynamic Environments. InIEEE International Conference on Multimedia and Expo. IEEE, 1–6

2024
[31]

Dianyi Yang, Yu Gao, Xihan Wang, Yufeng Yue, Yi Yang, and Mengyin Fu. 2025. OpenGS-SLAM: Open-Set Dense Semantic SLAM with 3D Gaussian Splatting for Object-Level Scene Understanding. InIEEE International Conference on Robotics and Automation. IEEE, 8486–8492

2025
[32]

Chao Yu, Zuxin Liu, Xin-Jun Liu, Fugui Xie, Yi Yang, Qi Wei, and Fei Qiao. 2018. DS-SLAM: A Semantic Visual SLAM towards Dynamic Environments. InIEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 1168–1174

2018
[33]

Xun Yuan and Song Chen. 2020. SaD-SLAM: A Visual SLAM Based on Semantic and Depth Information. InIEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 4930–4935

2020
[34]

Chaoning Zhang, Dongshen Han, Sheng Zheng, Jinwoo Choi, Tae-Ho Kim, and Choong Seon Hong. 2023. Mobilesamv2: Faster segment anything to everything. arXivabs/2312.09579 (2023)

work page arXiv 2023
[35]

Efros, Eli Shechtman, and Oliver Wang

Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang
[36]

In 2018 IEEE Conference on Computer Vision and Pattern Recognition

The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In 2018 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 586–595

2018
[37]

Youcai Zhang, Xinyu Huang, Jinyu Ma, Zhaoyang Li, Zhaochuan Luo, Yanchun Xie, Yuzhuo Qin, Tong Luo, Yaqian Li, Shilong Liu, Yandong Guo, and Lei Zhang
[38]

InIEEE/CVF Confer- ence on Computer Vision and Pattern Recognition

Recognize Anything: A Strong Image Tagging Model. InIEEE/CVF Confer- ence on Computer Vision and Pattern Recognition. IEEE, 1724–1732
[39]

Jianhao Zheng, Zihan Zhu, Valentin Bieri, Marc Pollefeys, Songyou Peng, and Iro Armeni. 2025. WildGS-SLAM: Monocular Gaussian Splatting SLAM in Dy- namic Environments. InIEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 11461–11471

2025
[40]

Siting Zhu, Renjie Qin, Guangming Wang, and Jiuming Liu andHesheng Wang
[41]

SemGauss-SLAM: Dense Semantic Gaussian Splatting SLAM.arXiv abs/2403.07494 (2024)

work page arXiv 2024
[42]

Oswald, and Marc Pollefeys

Zihan Zhu, Songyou Peng, Viktor Larsson, Weiwei Xu, Hujun Bao, Zhaopeng Cui, Martin R. Oswald, and Marc Pollefeys. 2022. NICE-SLAM: Neural Implicit Scalable Encoding for SLAM. InIEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 12776–12786

2022

[1] [1]

Fácil, Javier Civera, and José Neira

Berta Bescós, José M. Fácil, Javier Civera, and José Neira. 2018. DynaSLAM: Tracking, Mapping, and Inpainting in Dynamic Scenes.IEEE Robotics Autom. Lett.3, 4 (2018), 4076–4083

2018

[2] [2]

Yingchun Fan, Qichi Zhang, Yuliang Tang, Shaofeng Liu, and Hong Han. 2022. Blitz-SLAM: A semantic SLAM in dynamic environments.Pattern Recognit.121 (2022), 108225

2022

[3] [3]

Seongbo Ha, Jiung Yeon, and Hyeonwoo Yu. 2024. RGBD GS-ICP SLAM. In Computer Vision - ECCV 2024 - 18th European Conference, Vol. 15094. Springer, 180–197

2024

[4] [4]

Mu Hu, Wei Yin, Chi Zhang, Zhipeng Cai, Xiaoxiao Long, Hao Chen, Kaixuan Wang, Gang Yu, Chunhua Shen, and Shaojie Shen. 2024. Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-Shot Metric Depth and Surface Normal Estimation.IEEE Trans. Pattern Anal. Mach. Intell.46, 12 (2024), 10579– 10596

2024

[5] [5]

Xinggang Hu, Yunzhou Zhang, Zhenzhong Cao, Rong Ma, Yanmin Wu, Zhiqiang Deng, and Wenkai Sun. 2022. CFP-SLAM: A Real-time Visual SLAM Based on Coarse-to-Fine Probability in Dynamic Environments. InIEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 4399–4406

2022

[6] [6]

Yiming Ji, Yang Liu, Guanghu Xie, Boyu Ma, and Zongwu Xie an Hong Liu. 2024. NEDS-SLAM: A Neural Explicit Dense Semantic SLAM Framework Using 3D Gaussian Splatting.IEEE Robotics Autom. Lett.9, 10 (2024), 8778–8785

2024

[7] [7]

Haochen Jiang, Yueming Xu, Kejie Li, Jianfeng Feng, and Li Zhang. 2024. RoDyn- SLAM: Robust Dynamic Dense RGB-D SLAM With Neural Radiance Fields.IEEE Robotics Autom. Lett.9, 9 (2024), 7509–7516

2024

[8] [8]

Scherer, Deva Ramanan, and Jonathon Luiten

Nikhil Varma Keetha, Jay Karhade, Krishna Murthy Jatavallabhula, Gengshan Yang, Sebastian A. Scherer, Deva Ramanan, and Jonathon Luiten. 2024. SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM. InIEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 21357–21366

2024

[9] [9]

Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis

[10] [10]

Graph.42, 4 (2023), 139:1–139:14

3D Gaussian Splatting for Real-Time Radiance Field Rendering.ACM Trans. Graph.42, 4 (2023), 139:1–139:14

2023

[11] [11]

Linfei Li, Lin Zhang, Zhong Wang, and Ying Shen. 2024. GS 3LAM: Gaussian Semantic Splatting SLAM. InProceedings of the 32nd ACM International Conference on Multimedia. ACM, 3019–3027

2024

[12] [12]

Mingrui Li, Dong Li, Sijia Hu, Kangxu Wang, Zhenjun Zhao, and Hongyu Wang

[13] [13]

InProceedings of the 33rd ACM International Conference on Multimedia

SLAM-X:Generalizable Dynamic Removal for NeRF and Gaussian Splatting SLAM. InProceedings of the 33rd ACM International Conference on Multimedia. ACM, 1132–1140

[14] [14]

Mingrui Li, Shuhong Liu, Heng Zhou, Guohao Zhu, Na Cheng, Tianchen Deng, and Hongyu Wang. 2024. SGS-SLAM: Semantic Gaussian Splatting for Neu- ral Dense SLAM. InComputer Vision - ECCV 2024 - 18th European Conference. Springer, 163–179

2024

[15] [15]

Haosong Liu, Long Wang, Haiyong Luo, Fang Zhao, Runze Chen, Yushi Chen, Mingyu Xiao, Jiaquan Yan, and Dan Luo. 2025. SDD-SLAM: Semantic-Driven Dynamic SLAM With Gaussian Splatting.IEEE Robotics Autom. Lett.10, 6 (2025), 5721–5728

2025

[16] [16]

Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Qing Jiang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, and Lei Zhang. 2024. Grounding DINO: Marrying DINO with Grounded Pre-training for Open-Set Object Detection. InComputer Vision - ECCV 2024 - 18th European Conference, Vol. 15105. Springer, 38–55

2024

[17] [17]

Hidenobu Matsuki, Riku Murai, Paul H. J. Kelly, and Andrew J. Davison. 2024. Gaussian Splatting SLAM. InIEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 18039–18048

2024

[18] [18]

Srinivasan, Matthew Tancik, Jonathan T

Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. 2022. NeRF: representing scenes as neural radiance fields for view synthesis.Commun. ACM65, 1 (2022), 99–106

2022

[19] [19]

Raul Mur-Artal and Juan D. Tardós. 2017. ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras.IEEE Trans. Robotics33, 5 (2017), 1255–1262

2017

[20] [20]

Emanuele Palazzolo, Jens Behley, Philipp Lottes, Philippe Giguère, and Cyrill Stachniss. 2019. ReFusion: 3D Reconstruction in Dynamic Environments for RGB-D Cameras Exploiting Residuals. InIEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 7855–7862

2019

[21] [21]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. InProceedings of the 38th International Conference on Machine Learning, Vol. 139. PMLR, 8748–8763

2021

[22] [22]

Nicolas Schischka, Hannah Schieber, Mert Asim Karaoglu, Melih Görgülü, Flo- rian Grötzner, Alexander Ladikos, Nassir Navab, Daniel Roth, and Benjamin Busam. 2025. DynaMoN: Motion-Aware Fast and Robust Camera Localization for Dynamic Neural Radiance Fields.IEEE Robotics Autom. Lett.10, 1 (2025), 548–555

2025

[23] [23]

Jürgen Sturm, Nikolas Engelhard, Felix Endres, Wolfram Burgard, and Daniel Cremers. 2012. A benchmark for the evaluation of RGB-D SLAM systems. In IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 573– 580

2012

[24] [24]

Zachary Teed and Jia Deng. 2021. DROID-SLAM: Deep Visual SLAM for Monocu- lar, Stereo, and RGB-D Cameras. InAnnual Conference on Neural Information Pro- cessing Systems 2021, Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (Eds.). 16558–16569

2021

[25] [25]

Hengyi Wang, Jingwen Wang, and Lourdes Agapito. 2023. Co-SLAM: Joint Coordinate and Sparse Parametric Encodings for Neural Real-Time SLAM. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 13293– 13302

2023

[26] [26]

Haitao Wang, Sijia Wen, and Bo Guo. 2025. Polarimetric Monocular Gaussian Splatting SLAM for Dense Surface Reconstruction. InProceedings of the 33rd ACM International Conference on Multimedia. ACM, 7519–7528

2025

[27] [27]

Bovik, Hamid R

Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity.IEEE Trans. Image Process.13, 4 (2004), 600–612

2004

[28] [28]

Yueming Xu, Haochen Jiang, Zhongyang Xiao, Jianfeng Feng, and Li Zhang

[29] [29]

InAdvances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024

DG-SLAM: Robust Dynamic Gaussian Splatting SLAM with Hybrid Pose Optimization. InAdvances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024

2024

[30] [30]

Ziheng Xu, Jianwei Niu, Qingfeng Li, Tao Ren, and Chen Chen. 2024. NID-SLAM: Neural Implicit Representation-based RGB-D SLAM In Dynamic Environments. InIEEE International Conference on Multimedia and Expo. IEEE, 1–6

2024

[31] [31]

Dianyi Yang, Yu Gao, Xihan Wang, Yufeng Yue, Yi Yang, and Mengyin Fu. 2025. OpenGS-SLAM: Open-Set Dense Semantic SLAM with 3D Gaussian Splatting for Object-Level Scene Understanding. InIEEE International Conference on Robotics and Automation. IEEE, 8486–8492

2025

[32] [32]

Chao Yu, Zuxin Liu, Xin-Jun Liu, Fugui Xie, Yi Yang, Qi Wei, and Fei Qiao. 2018. DS-SLAM: A Semantic Visual SLAM towards Dynamic Environments. InIEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 1168–1174

2018

[33] [33]

Xun Yuan and Song Chen. 2020. SaD-SLAM: A Visual SLAM Based on Semantic and Depth Information. InIEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 4930–4935

2020

[34] [34]

Chaoning Zhang, Dongshen Han, Sheng Zheng, Jinwoo Choi, Tae-Ho Kim, and Choong Seon Hong. 2023. Mobilesamv2: Faster segment anything to everything. arXivabs/2312.09579 (2023)

work page arXiv 2023

[35] [35]

Efros, Eli Shechtman, and Oliver Wang

Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang

[36] [36]

In 2018 IEEE Conference on Computer Vision and Pattern Recognition

The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In 2018 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 586–595

2018

[37] [37]

Youcai Zhang, Xinyu Huang, Jinyu Ma, Zhaoyang Li, Zhaochuan Luo, Yanchun Xie, Yuzhuo Qin, Tong Luo, Yaqian Li, Shilong Liu, Yandong Guo, and Lei Zhang

[38] [38]

InIEEE/CVF Confer- ence on Computer Vision and Pattern Recognition

Recognize Anything: A Strong Image Tagging Model. InIEEE/CVF Confer- ence on Computer Vision and Pattern Recognition. IEEE, 1724–1732

[39] [39]

Jianhao Zheng, Zihan Zhu, Valentin Bieri, Marc Pollefeys, Songyou Peng, and Iro Armeni. 2025. WildGS-SLAM: Monocular Gaussian Splatting SLAM in Dy- namic Environments. InIEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 11461–11471

2025

[40] [40]

Siting Zhu, Renjie Qin, Guangming Wang, and Jiuming Liu andHesheng Wang

[41] [41]

SemGauss-SLAM: Dense Semantic Gaussian Splatting SLAM.arXiv abs/2403.07494 (2024)

work page arXiv 2024

[42] [42]

Oswald, and Marc Pollefeys

Zihan Zhu, Songyou Peng, Viktor Larsson, Weiwei Xu, Hujun Bao, Zhaopeng Cui, Martin R. Oswald, and Marc Pollefeys. 2022. NICE-SLAM: Neural Implicit Scalable Encoding for SLAM. InIEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 12776–12786

2022