Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Niessner

Dai, A · 2017 · DOI 10.1109/cvpr.2017.261

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

open at publisher browse 6 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Think While You Map: Asynchronous Vision-Language Agents for Incremental 3D Scene Graphs

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

An asynchronous architecture decouples incremental voxel-based mapping from VLM-based semantic enrichment to produce queryable open-vocabulary 3D scene graphs that match or exceed prior methods on segmentation and grounding benchmarks.

From Pixels to Concepts: Growing Rich 3D Semantic Scene Graph Forests utilizing Foundation Models

cs.RO · 2026-06-22 · unverdicted · novelty 6.0

Uses VLMs to detect instance concepts and LLMs to infer abstract relationships, assembling them into 3D scene graph forests that are evaluated on uHumans2 and ScanNet and tested in open-vocabulary retrieval on a Spot robot.

Language as a Sensor: Calibrated Spatial Belief Estimation in 3D Scenes from Natural Language

cs.RO · 2026-06-07 · unverdicted · novelty 6.0

Introduces LSM that outputs calibrated multimodal spatial distributions from language plus scene graph, fused via VL-Map to improve 3D target localization on VLA-3D benchmark and real robot.

Planning with the Views

cs.AI · 2026-05-28 · unverdicted · novelty 6.0

Frontier VLMs show basic single-step view-action knowledge but fail at multi-turn composition in 3D; an iterative self-exploration and view-graph-distillation framework lifts Qwen2.5-VL-7B to 47.8% success, beating larger models.

Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models

cs.RO · 2026-06-09 · unverdicted · novelty 5.0

Embodied-R1.5 is an 8B EFM achieving SOTA on 16 of 24 embodied VLM benchmarks, fine-tunable to outperform leading VLAs, with claimed zero-shot real-robot generalization.

A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

cs.RO · 2025-07-02 · unverdicted · novelty 5.0

The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.

citing papers explorer

Showing 1 of 1 citing paper after filters.

A Survey on Vision-Language-Action Models: An Action Tokenization Perspective cs.RO · 2025-07-02 · unverdicted · none · ref 250
The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.

Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Niessner

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer