LaRI: Layered Ray Intersections for Single-view 3D Geometric Reasoning

Biao Zhang; Federico Tombari; Peter Wonka; Rui Li; Zhenyu Li

arxiv: 2504.18424 · v2 · pith:5RVRL4OGnew · submitted 2025-04-25 · 💻 cs.CV

LaRI: Layered Ray Intersections for Single-view 3D Geometric Reasoning

Rui Li , Biao Zhang , Zhenyu Li , Federico Tombari , Peter Wonka This is my paper

classification 💻 cs.CV

keywords larilayeredreasoninggeometricintersectionsmethodobject-levelreconstruction

0 comments

read the original abstract

We present Layered Ray Intersections (LaRI), a fully supervised method for occluded geometry reasoning from a single image. Unlike conventional depth estimation, which is limited to visible surfaces, LaRI predicts multiple surfaces intersected by the camera rays using layered point maps. Compared to the existing approaches that leverage neural implicit representations or iterative refinement, LaRI achieves complete scene reconstruction in one feed-forward pass, enabling efficient and view-aligned geometric reasoning to underpin both object-level and scene-level tasks. We further propose to predict the ray stopping index, which identifies valid intersecting pixels and layers from LaRI's output. To better underpin and evaluate this task, we build an annotation pipeline using rendering engines, construct annotations for five public datasets, including synthetic and real-world data covering 3D objects and scenes. As a generic method, LaRI's performance is validated in object-level and scene-level reconstruction tasks.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

World Tracing: Generative Pixel-Aligned Geometry Beyond the Visible
cs.CV 2026-06 unverdicted novelty 7.0

World Tracing introduces a multi-layer pixel-aligned 3D point representation instantiated via a diffusion transformer (WT-DiT) trained with pixel-space flow matching to jointly reconstruct visible surfaces and generat...
Pixal3D: Pixel-Aligned 3D Generation from Images
cs.CV 2026-05 unverdicted novelty 6.0

Pixal3D performs pixel-aligned 3D generation from images via back-projected multi-scale feature volumes, achieving fidelity close to reconstruction while supporting multi-view and scene synthesis.
VolFill: Single-View Amodal 3D Scene Reconstruction with Volumetric Flow Matching
cs.CV 2026-05 unverdicted novelty 5.0

VolFill uses a hybrid 3D VAE to compress sparse truncated unsigned distance function grids into latent space and a latent Diffusion Transformer to denoise complete scenes, conditioned on geometry foundation models, ou...