4DLangVGGT: 4D language-visual geometry grounded transformer

Wu, X · 2025 · arXiv 2512.05060

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers

cs.CV · 2026-05-22 · unverdicted · novelty 6.0

A two-stage diversity-plus-entropy token selection framework speeds up visual geometry transformers by over 85% on 500-image scenes while preserving baseline accuracy.

High-Fidelity 4D Hand-Object Capture via Multi-View Spatiotemporal Tracking and Physics-Aware Gaussians

cs.CV · 2026-06-14 · unverdicted · novelty 5.0

A multi-view feed-forward transformer provides initial poses and geometry from calibrated videos, followed by physics-aware Gaussian optimization with tetrahedral and collision constraints to produce robust 4D hand-object reconstructions.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers cs.CV · 2026-05-22 · unverdicted · none · ref 93
A two-stage diversity-plus-entropy token selection framework speeds up visual geometry transformers by over 85% on 500-image scenes while preserving baseline accuracy.
High-Fidelity 4D Hand-Object Capture via Multi-View Spatiotemporal Tracking and Physics-Aware Gaussians cs.CV · 2026-06-14 · unverdicted · none · ref 58
A multi-view feed-forward transformer provides initial poses and geometry from calibrated videos, followed by physics-aware Gaussian optimization with tetrahedral and collision constraints to produce robust 4D hand-object reconstructions.

4DLangVGGT: 4D language-visual geometry grounded transformer

fields

years

verdicts

representative citing papers

citing papers explorer