GLMap combines explicit 3D Gaussians with multi-scale language semantics in a dual-modality structure and uses an analytical Gaussian Estimator for incremental map building, improving zero-shot performance on navigation and reasoning tasks.
Scannet: Richly-annotated 3d reconstructions of indoor scenes
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3verdicts
UNVERDICTED 3representative citing papers
Chorus pretrains a shared 3D Gaussian scene encoder via multi-teacher distillation to capture holistic features from high-level semantics to fine-grained structure, with strong transfer on segmentation and point-cloud tasks using far fewer scenes.
MV3DIS uses 3D-guided mask matching and depth consistency to produce more consistent multi-view 2D masks that refine into accurate zero-shot 3D instances.
citing papers explorer
-
Multi-Scale Gaussian-Language Map for Zero-shot Embodied Navigation and Reasoning
GLMap combines explicit 3D Gaussians with multi-scale language semantics in a dual-modality structure and uses an analytical Gaussian Estimator for incremental map building, improving zero-shot performance on navigation and reasoning tasks.
-
Chorus: Multi-Teacher Pretraining for Holistic 3D Gaussian Scene Encoding
Chorus pretrains a shared 3D Gaussian scene encoder via multi-teacher distillation to capture holistic features from high-level semantics to fine-grained structure, with strong transfer on segmentation and point-cloud tasks using far fewer scenes.
-
MV3DIS: Multi-View Mask Matching via 3D Guides for Zero-Shot 3D Instance Segmentation
MV3DIS uses 3D-guided mask matching and depth consistency to produce more consistent multi-view 2D masks that refine into accurate zero-shot 3D instances.