hub

Grounded 3d-llm with referent tokens

· 2024 · arXiv 2405.10370

16 Pith papers cite this work. Polarity classification is still indexing.

16 Pith papers citing it

read on arXiv browse 16 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2 dataset 1

citation-polarity summary

background 2 use dataset 1

representative citing papers

VoxAfford: Multi-Scale Voxel-Token Fusion for Open-Vocabulary 3D Affordance Detection

cs.CV · 2026-05-02 · unverdicted · novelty 7.0

VoxAfford fuses multi-scale voxel features into MLLM output tokens using cross-attention with a learned compatibility gate to achieve SOTA open-vocabulary 3D affordance detection with ~8% mIoU gain and zero-shot robot transfer.

SpatialMosaic: A Multiview VLM Dataset for Partial Visibility

cs.CV · 2025-12-29 · unverdicted · novelty 7.0

SpatialMosaic introduces a 2M-pair multi-view QA dataset and 1M-pair benchmark for MLLMs on spatial reasoning under partial visibility, plus a hybrid baseline that integrates 3D reconstruction models as geometry encoders.

PAR3D: A Unified 3D-MLLM with Part-Aware Representation for Scene Understanding

cs.CV · 2026-06-04 · unverdicted · novelty 6.0

PAR3D is a part-aware 3D-MLLM framework with ScenePart dataset, Part-Aware 3D Representation Learning, and Hierarchical Segmentation Query Generation to improve part-level 3D scene understanding.

Zero-Shot 3D Question Answering via Hierarchical View-to-Token Transportation

cs.CV · 2026-06-02 · unverdicted · novelty 6.0

KeyVT improves zero-shot 3D question answering by hierarchically selecting semantically and geometrically relevant views and using optimal transport to extract representative tokens from them.

Distilling Neuro-Symbolic Programs into 3D Multi-modal LLMs

cs.CV · 2026-05-31 · unverdicted · novelty 6.0 · 2 refs

APEIRIA distills neuro-symbolic 3D reasoning programs into 3D MLLMs through a curriculum that transfers stepwise verification patterns to achieve transparent yet flexible spatial reasoning.

SSR3D-LLM: Structured Spatial Reasoning via Latent Steps for Fine-Grained Grounding in Unified 3D-LLMs

cs.CV · 2026-05-27 · unverdicted · novelty 6.0

SSR3D-LLM improves fine-grained 3D grounding in unified 3D-LLMs by generating and scoring sequences of latent spatial reasoning steps from the query using fixed Mask3D proposals.

Let Geometry GUIDE: Layer-wise Unrolling of Geometric Priors in Multimodal LLMs

cs.CV · 2026-04-07 · unverdicted · novelty 6.0

GUIDE unrolls multi-granularity geometric priors layer-wise into early MLLM layers with gating to improve spatial reasoning and perception.

Chat-Scene++: Exploiting Context-Rich Object Identification for 3D LLM

cs.CV · 2026-03-29 · unverdicted · novelty 6.0

Chat-Scene++ improves 3D scene understanding in multimodal LLMs by representing scenes as context-rich object sequences with identifier tokens and grounded chain-of-thought reasoning, reaching state-of-the-art on five benchmarks using pre-trained encoders.

Feeling the Space: Egomotion-Aware Video Representation for Efficient and Accurate 3D Scene Understanding

cs.CV · 2026-03-18 · unverdicted · novelty 6.0

Motion-MLLM integrates IMU egomotion data into MLLMs using cascaded filtering and asymmetric fusion to ground visual content in physical trajectories for scale-aware 3D understanding, achieving competitive accuracy at higher speed.

DEGround: An Effective Baseline for Ego-centric 3D Visual Grounding with a Homogeneous Framework

cs.CV · 2025-06-05 · unverdicted · novelty 6.0

DEGround presents a unified homogeneous framework for 3D visual grounding with shared queries and two plug-in modules for better instruction alignment, reporting a 7.52% improvement on the EmbodiedScan benchmark.

Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence

cs.CV · 2025-05-29 · unverdicted · novelty 6.0 · 2 refs

Spatial-MLLM adds a 3D spatial encoder initialized from a visual geometry model and space-aware frame sampling to MLLMs to improve spatial understanding and reasoning from purely 2D visual inputs.

Efficient3D: A Unified Framework for Adaptive and Debiased Token Reduction in 3D MLLMs

cs.CV · 2026-04-03 · unverdicted · novelty 5.0

Efficient3D prunes visual tokens in 3D MLLMs via DVTIE and ATR modules, reporting better performance than unpruned baselines on Scan2Cap and other benchmarks.

3D-IDE: 3D Implicit Depth Emergent

cs.CV · 2026-03-28 · unverdicted · novelty 5.0

3D awareness emerges implicitly in MLLMs via self-supervised geometric constraints that create an information bottleneck, removing depth and pose dependencies at inference and cutting latency by 55%.

Reinforcing 3D Understanding in Point-VLMs via Geometric Reward Credit Assignment

cs.CV · 2026-04-23 · unverdicted · novelty 4.0

Geometric Reward Credit Assignment disentangles rewards to geometric tokens and adds reprojection consistency to boost 3D keypoint accuracy from 0.64 to 0.93 and bounding box IoU to 0.686 on a ShapeNetCore benchmark while preserving 2D performance.

Open-Architecture End-to-End System for Real-World Autonomous Robot Navigation

cs.RO · 2024-10-08 · unverdicted · novelty 4.0

Presents an open ROS2-based end-to-end navigation system for quadruped robots achieving over 88% success in zero-shot real-world indoor navigation tasks using semantic scene graphs and LLM planning.

EgoMind: Activating Spatial Cognition through Linguistic Reasoning in MLLMs

cs.CV · 2026-04-01

citing papers explorer

Showing 1 of 1 citing paper after filters.

Open-Architecture End-to-End System for Real-World Autonomous Robot Navigation cs.RO · 2024-10-08 · unverdicted · none · ref 69
Presents an open ROS2-based end-to-end navigation system for quadruped robots achieving over 88% success in zero-shot real-world indoor navigation tasks using semantic scene graphs and LLM planning.

Grounded 3d-llm with referent tokens

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer