3d-r1: Enhancing reasoning in 3d vlms for unified scene understanding

Huang, H · 2025 · arXiv 2507.23478

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

CaMo: Camera Motion Grounded Evaluation and Training for Vision-Language Models

cs.CV · 2026-05-19 · unverdicted · novelty 7.0

Proposes Spatial Narrative Score (SNS) evaluation for VLMs' camera motion understanding and introduces CaMo model achieving consistent performance on SNS and direct QA.

LMM-Track4D: Eliciting 4D Dynamic Reasoning in LMMs via Trajectory-Grounded Dialogue

cs.CV · 2026-05-19 · unverdicted · novelty 7.0

LMM-Track4D formulates a trajectory-grounded dialogue task, releases Track4D-Bench with 526 samples, and proposes RTGE encoding, TRK state token, and OSK-RA decoder to elicit better 4D spatiotemporal reasoning in LMMs.

Token Warping Helps MLLMs Look from Nearby Viewpoints

cs.CV · 2026-04-03 · unverdicted · novelty 7.0

Backward token warping in ViT-based MLLMs enables reliable reasoning from nearby viewpoints by preserving semantic coherence better than pixel-wise warping or fine-tuning baselines.

Distilling Neuro-Symbolic Programs into 3D Multi-modal LLMs

cs.CV · 2026-05-31 · unverdicted · novelty 6.0 · 2 refs

APEIRIA distills neuro-symbolic 3D reasoning programs into 3D MLLMs through a curriculum that transfers stepwise verification patterns to achieve transparent yet flexible spatial reasoning.

Last-Layer-Centric Feature Recombination: Unleashing 3D Geometric Knowledge in DINOv3 for Monocular Depth Estimation

cs.CV · 2026-04-29 · unverdicted · novelty 6.0

Layer analysis of DINOv3 shows non-uniform 3D geometric knowledge concentrated in deeper layers, enabling a last-layer-centric recombination module that improves monocular depth estimation accuracy to state-of-the-art levels.

GeoWorld: Geometric World Models

cs.CV · 2026-02-26 · unverdicted · novelty 6.0

GeoWorld applies hyperbolic geometry to JEPA world models and introduces geometric reinforcement learning, reporting modest success-rate gains of ~3% and ~2% on 3- and 4-step planning tasks versus V-JEPA 2.

Grounded 3D-Aware Spatial Vision-Language Modeling

cs.CV · 2026-05-28 · unverdicted · novelty 5.0

GR3D is a VLM that combines explicit 2D, implicit 2D, and monocular 3D grounding mechanisms to improve performance on spatial understanding benchmarks.

UniMesh: Unifying 3D Mesh Understanding and Generation

cs.CV · 2026-04-19 · unverdicted · novelty 5.0

UniMesh unifies 3D mesh generation and understanding in one model via a Mesh Head interface, Chain of Mesh iterative editing, and an Actor-Evaluator self-reflection loop.

XEmbodied: A Foundation Model with Enhanced Geometric and Physical Cues for Large-Scale Embodied Environments

cs.CV · 2026-04-20 · unverdicted · novelty 4.0

XEmbodied is a foundation model that integrates 3D geometric and physical signals into VLMs using a 3D Adapter and Efficient Image-Embodied Adapter, plus progressive curriculum and RL post-training, to improve spatial reasoning and embodied performance on 18 benchmarks.

A Survey of Reinforcement Learning for Large Reasoning Models

cs.CL · 2025-09-10 · accept · novelty 3.0

A survey compiling RL methods, challenges, data resources, and applications for enhancing reasoning in large language models and large reasoning models since DeepSeek-R1.

citing papers explorer

Showing 1 of 1 citing paper after filters.

A Survey of Reinforcement Learning for Large Reasoning Models cs.CL · 2025-09-10 · accept · none · ref 211
A survey compiling RL methods, challenges, data resources, and applications for enhancing reasoning in large language models and large reasoning models since DeepSeek-R1.

3d-r1: Enhancing reasoning in 3d vlms for unified scene understanding

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer