DoReMi uses self-supervised pre-training on topological and texture variations plus domain-aware experts with spatial-guided routing and entropy-controlled allocation to reach 80.1% mIoU on ScanNet and 77.2% mIoU on S3DIS.
3d-moe: A mixture-of-experts multi-modal LLM for 3d vision and pose diffusion via rectified flow
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
This is the first survey on vision-language-action models, providing a taxonomy across three lines, plus summaries of datasets, simulators, benchmarks, challenges, and future directions in embodied AI.
citing papers explorer
-
DoReMi: Bridging 3D Domains via Topology-Aware Domain-Representation Mixture of Experts
DoReMi uses self-supervised pre-training on topological and texture variations plus domain-aware experts with spatial-guided routing and entropy-controlled allocation to reach 80.1% mIoU on ScanNet and 77.2% mIoU on S3DIS.
-
A Survey on Vision-Language-Action Models for Embodied AI
This is the first survey on vision-language-action models, providing a taxonomy across three lines, plus summaries of datasets, simulators, benchmarks, challenges, and future directions in embodied AI.