PaceVGGT reduces VGGT inference latency by up to 5.1x on ScanNet-50 via pre-AA token pruning with a distilled Token Scorer, per-frame keep budgets, adaptive merge/prune, and feature-guided restoration, while preserving reconstruction quality on ScanNet-50 and 7-Scenes.
Litevggt: Boosting vanilla vggt via geometry-aware cached token merging
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 5years
2026 5verdicts
UNVERDICTED 5representative citing papers
ReRe boosts open-source MLLMs on spatial reasoning benchmarks VSI-Bench and STI-Bench to rival proprietary SOTA by using a two-phase Reason then Re-reason process with Geometry-to-Video novel view synthesis.
FGQ applies diagonal Fisher information to guide learnable affine transformations in PTQ for multi-task VGGT, yielding up to 39% relative gains over baselines at 4-bit quantization.
Spark3R achieves up to 28x speedup on 1000-frame 3D reconstruction inputs by asymmetrically reducing query and key-value tokens in Vision Transformers while keeping competitive quality.
The paper proposes a problem-driven taxonomy for feed-forward 3D scene modeling that groups methods by five core challenges: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temporal-aware modeling.
citing papers explorer
-
PaceVGGT: Pre-Alternating-Attention Token Pruning for Visual Geometry Transformers
PaceVGGT reduces VGGT inference latency by up to 5.1x on ScanNet-50 via pre-AA token pruning with a distilled Token Scorer, per-frame keep budgets, adaptive merge/prune, and feature-guided restoration, while preserving reconstruction quality on ScanNet-50 and 7-Scenes.
-
Reason, Then Re-reason: Cross-view Revisiting Improves Spatial Reasoning
ReRe boosts open-source MLLMs on spatial reasoning benchmarks VSI-Bench and STI-Bench to rival proprietary SOTA by using a two-phase Reason then Re-reason process with Geometry-to-Video novel view synthesis.
-
Not All Tasks Quantize Equally: Fisher-Guided Quantization for Visual Geometry Transformer
FGQ applies diagonal Fisher information to guide learnable affine transformations in PTQ for multi-task VGGT, yielding up to 39% relative gains over baselines at 4-bit quantization.
-
Spark3R: Asymmetric Token Reduction Makes Fast Feed-Forward 3D Reconstruction
Spark3R achieves up to 28x speedup on 1000-frame 3D reconstruction inputs by asymmetrically reducing query and key-value tokens in Vision Transformers while keeping competitive quality.
-
Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective
The paper proposes a problem-driven taxonomy for feed-forward 3D scene modeling that groups methods by five core challenges: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temporal-aware modeling.