iLRM: An Iterative Large 3D Reconstruction Model

Abdelrahman Mohamed; Eunbyung Park; Gyeongjin Kang; Sameh Khamis; Seungkwon Yang; Seungtae Nam; Xiangyu Sun

arxiv: 2507.23277 · v3 · pith:FXAJJ7H4new · submitted 2025-07-31 · 💻 cs.CV

iLRM: An Iterative Large 3D Reconstruction Model

Gyeongjin Kang , Seungtae Nam , Seungkwon Yang , Xiangyu Sun , Sameh Khamis , Abdelrahman Mohamed , Eunbyung Park This is my paper

classification 💻 cs.CV

keywords reconstructionattentionilrmiterativerepresentationscomputationalcostsfeed-forward

0 comments

read the original abstract

Feed-forward 3D modeling has emerged as a promising approach for rapid and high-quality 3D reconstruction. In particular, directly generating explicit 3D representations, such as 3D Gaussian splatting, has attracted significant attention due to its fast and high-quality rendering. However, many state-of-the-art methods, primarily based on transformer architectures, suffer from severe scalability issues because they rely on full attention across image tokens from multiple input views, resulting in prohibitive computational costs as the number of views or image resolution increases. Toward a scalable and efficient feed-forward 3D reconstruction, we introduce an iterative Large 3D Reconstruction Model (iLRM) that generates 3D Gaussian representations through an iterative refinement mechanism, guided by three core principles: (1) decoupling the scene representation from input images to enable compact 3D representations; (2) decomposing global multi-view interactions into a two-stage attention scheme to reduce computational costs; and (3) injecting high-resolution information at every layer to achieve high-fidelity reconstruction. Experimental results on widely used datasets, such as RE10K and DL3DV, demonstrate that iLRM outperforms existing methods in both reconstruction quality and speed.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

DF3DV-1K: A Large-Scale Dataset and Benchmark for Distractor-Free Novel View Synthesis
cs.CV 2026-04 unverdicted novelty 8.0

DF3DV-1K supplies 1,048 scenes with clean and cluttered image pairs plus a challenging 41-scene subset to benchmark and improve distractor-free radiance field methods.
AdaptSplat: Adapting Vision Foundation Models for Feed-Forward 3D Gaussian Splatting
cs.CV 2026-05 unverdicted novelty 7.0

AdaptSplat adds a Frequency-Preserving Adapter to vision foundation models to boost high-frequency fidelity and cross-domain performance in feed-forward 3D Gaussian Splatting.
AnchorSplat: Feed-Forward 3D Gaussian Splatting with 3D Geometric Priors
cs.CV 2026-04 unverdicted novelty 7.0

AnchorSplat uses anchor-aligned 3D Gaussians guided by geometric priors for feed-forward scene reconstruction, achieving SOTA novel view synthesis on ScanNet++ with fewer primitives and better view consistency.
Resolving Representation Ambiguity in Feedforward Novel View Synthesis Transformer via Semantic-Spatial Decoupling
cs.CV 2026-05 unverdicted novelty 6.0

Decouples semantic and spatial tokens in NVS transformers to resolve representation ambiguity, yielding consistent gains with near-zero added latency.
Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective
cs.CV 2026-04 unverdicted novelty 6.0

The paper proposes a problem-driven taxonomy for feed-forward 3D scene modeling that groups methods by five core challenges: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temp...
AdaptSplat: Adapting Vision Foundation Models for Feed-Forward 3D Gaussian Splatting
cs.CV 2026-05 unverdicted novelty 5.0

AdaptSplat adds a lightweight Frequency-Preserving Adapter to vision foundation models that extracts direction-aware high-frequency priors and integrates them via positional encodings and residual modulation to improv...