Distillation from visual foundation models to lidar enables frame-wise indoor semantic segmentation without manual annotations, achieving up to 56% mIoU on pseudo labels and 36% on real labels.
Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
4 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 4years
2026 4representative citing papers
Coarse four-group semantic color coding (RGBB) appended to point clouds before tokenization improves LLM-based structured indoor prediction on Structured3D, SpatialLM, and ARKitScenes, especially for openings and furniture instances.
MambaPanoptic is a fully Mamba-based panoptic segmentation model that uses MambaFPN for multi-scale features and a QuadMamba kernel generator to outperform PanopticDeepLab and PanopticFCN on Cityscapes and COCO while using fewer parameters than Mask2Former.
citing papers explorer
-
Feasibility of Indoor Frame-Wise Lidar Semantic Segmentation via Distillation from Visual Foundation Model
Distillation from visual foundation models to lidar enables frame-wise indoor semantic segmentation without manual annotations, achieving up to 56% mIoU on pseudo labels and 36% on real labels.
-
Coarse Semantic Injection for LLM-Conditioned Structured Indoor Prediction
Coarse four-group semantic color coding (RGBB) appended to point clouds before tokenization improves LLM-based structured indoor prediction on Structured3D, SpatialLM, and ARKitScenes, especially for openings and furniture instances.
-
MambaPanoptic: A Vision Mamba-based Structured State Space Framework for Panoptic Segmentation
MambaPanoptic is a fully Mamba-based panoptic segmentation model that uses MambaFPN for multi-scale features and a QuadMamba kernel generator to outperform PanopticDeepLab and PanopticFCN on Cityscapes and COCO while using fewer parameters than Mask2Former.
- Diagnosing and Correcting Concept Omission in Multimodal Diffusion Transformers