arxiv: 2604.19747 · v1 · submitted 2026-04-21 · 💻 cs.CV

Recognition: unknown

AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model

Yutian Chen , Shi Guo , Renbiao Jin , Tianshuo Yang , Xin Cai , Yawen Luo , Mingxin Yang , Mulin Yu

show 2 more authors

Linning Xu Tianfan Xue

Authors on Pith no claims yet

Pith reviewed 2026-05-10 02:00 UTC · model grok-4.3

classification 💻 cs.CV

keywords sparse view 3D reconstructionvideo diffusion modelgeometric memoryarbitrary view reconstructiondiffusion distillationscene reconstructiongeometric consistency

0 comments

The pith

AnyRecon reconstructs 3D scenes from arbitrary unordered sparse views by coupling video diffusion with explicit geometric memory.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces AnyRecon as a framework to perform 3D reconstruction from any number of arbitrary and unordered input views using a video diffusion model. Current diffusion-based methods are limited to conditioning on only one or two frames, which causes problems with geometric consistency in complex scenes. AnyRecon builds a persistent global scene memory and uses geometry-driven retrieval to link generation and reconstruction. This allows the system to scale to larger scenes and handle bigger viewpoint differences while keeping explicit control over geometry. Efficiency comes from distilling the diffusion process to four steps and applying sparse attention over context windows.

Core claim

AnyRecon constructs a persistent global scene memory via a prepended capture view cache and removes temporal compression to maintain frame-level correspondence. It couples generation and reconstruction through an explicit 3D geometric memory and geometry-driven capture-view retrieval, combined with 4-step diffusion distillation and context-window sparse attention for efficiency. This enables robust reconstruction from irregular inputs, large viewpoint gaps, and long trajectories.

What carries the argument

The geometry-aware conditioning strategy that uses an explicit 3D geometric memory and geometry-driven capture-view retrieval to couple generation and reconstruction.

If this is right

Reconstruction remains consistent even with large viewpoint gaps and long trajectories.
The system scales to diverse and large-scale 3D scenes beyond what single or dual frame conditioning allows.
Flexible conditioning on varying numbers of input frames without loss of geometric control.
Efficient processing through distilled diffusion steps and sparse attention mechanisms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach could extend traditional reconstruction techniques by incorporating generative capabilities for filling in missing views in sparse captures.
Applications in robotics or AR might benefit from handling casual, unordered video inputs without requiring structured capture paths.
Further improvements in memory management could allow even longer sequences or real-time updates.

Load-bearing premise

That the interplay between generation and reconstruction enforced by the explicit 3D geometric memory will maintain consistency across large viewpoint changes without introducing new inconsistencies.

What would settle it

A test reconstruction on a scene with extremely large viewpoint gaps or very long trajectories that shows visible geometric distortions or inconsistencies compared to ground truth.

Figures

Figures reproduced from arXiv: 2604.19747 by Linning Xu, Mingxin Yang, Mulin Yu, Renbiao Jin, Shi Guo, Tianfan Xue, Tianshuo Yang, Xin Cai, Yawen Luo, Yutian Chen.

**Figure 1.** Figure 1: AnyRecon demonstrates robust performance across multiple reconstruction settings: (Top) Interpolation, filling in gaps between distant captured views; (Middle) Extrapolation, synthesizing novel content beyond the observed range; and (Bottom) Large Scene Reconstruction, maintaining consistency across long-trajectory sequences. The camera visualization (left) illustrates the sparse input poses (red) and the… view at source ↗

**Figure 2.** Figure 2: Pipeline of AnyRecon [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Ablation on temporal compression (TC), 4-step distillation and [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Explicit 3D Geometry Memory Update. Without memory update, newly generated trajectory segments are not integrated into the reconstructed point cloud, leading to incomplete geometry and inconsistent rendering in subsequent chunks. Our explicit memory incrementally integrates generated views into the point cloud, maintaining coherent scene structure across trajectory segments. mulated as a pseudo-regression… view at source ↗

**Figure 5.** Figure 5: Geometry-aware Memory Retrieval. While FOV- or similarity-based methods would select all four views, our geometry-aware retrieval accounts for 3D spatial overlap and visibility. In the render index map (right), each color corresponds to a source view, representing its geometric contribution to the target perspective. This mechanism effectively excludes occluded views (e.g., the yellow view) that provide no… view at source ↗

**Figure 6.** Figure 6: Quality Results on DL3DV Dataset [11]. This visibility map allows us to quantify the geometric contribution of each capture view to the current target perspective. Formally, let C = {(Ii , Pi)} denote the set of capture views and their poses. For each candidate view i, we compute how many of the visible points under the target viewpoint originate from view i during reconstruction as: s_i = \frac {|\mathcal… view at source ↗

**Figure 7.** Figure 7: Quality Results on Tanks and Temples Dataset [9]. Evaluation Benchmarks. To evaluate the generalization and robustness of AnyRecon, we conduct extensive testing on 10 scenes from DL3DV-Evaluation set [11], and 5 scenes from Tanks and Temples Dataset [9]. For each test sequence, we we sample 40 frames at a resolution of 512 × 896 frames; specifically for the high-density Tanks and Temples sequences, we per… view at source ↗

**Figure 8.** Figure 8: Quality comparison on global scene memory [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗

read the original abstract

Sparse-view 3D reconstruction is essential for modeling scenes from casual captures, but remain challenging for non-generative reconstruction. Existing diffusion-based approaches mitigates this issues by synthesizing novel views, but they often condition on only one or two capture frames, which restricts geometric consistency and limits scalability to large or diverse scenes. We propose AnyRecon, a scalable framework for reconstruction from arbitrary and unordered sparse inputs that preserves explicit geometric control while supporting flexible conditioning cardinality. To support long-range conditioning, our method constructs a persistent global scene memory via a prepended capture view cache, and removes temporal compression to maintain frame-level correspondence under large viewpoint changes. Beyond better generative model, we also find that the interplay between generation and reconstruction is crucial for large-scale 3D scenes. Thus, we introduce a geometry-aware conditioning strategy that couples generation and reconstruction through an explicit 3D geometric memory and geometry-driven capture-view retrieval. To ensure efficiency, we combine 4-step diffusion distillation with context-window sparse attention to reduce quadratic complexity. Extensive experiments demonstrate robust and scalable reconstruction across irregular inputs, large viewpoint gaps, and long trajectories.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

AnyRecon adapts video diffusion for arbitrary unordered sparse views via a persistent cache and explicit 3D memory, but the geometry-retrieval loop for large gaps remains the least proven part.

read the letter

The core move here is extending a video diffusion backbone to handle any number of unordered sparse inputs instead of the usual one or two frames. They keep a prepended capture-view cache as persistent global memory, drop temporal compression to preserve frame correspondence across big viewpoint shifts, and add geometry-aware conditioning that pulls in an explicit 3D memory plus geometry-driven view retrieval. The efficiency side uses 4-step distillation and sparse attention to keep things tractable. These choices look like real engineering responses to the limits of prior diffusion recon work on casual captures. The claim that generation and reconstruction need to be coupled through that memory for large scenes is plausible and addresses a practical gap in AR or robotics settings. Experiments are described as covering irregular inputs, large gaps, and long trajectories, which is the right test bed if the numbers hold up. The design is concrete enough that a reader can see exactly where it differs from earlier methods. The soft spot is the retrieval step itself. It assumes the current 3D estimate is good enough to select useful views, yet the whole task is to produce that estimate from the sparse set. If early geometry is noisy, the loop can reinforce errors or simply ignore better views. Diffusion sampling can still add inconsistencies even with the memory attached, and nothing in the abstract shows a derivation that the 4-step model plus sparse attention will preserve exact multi-view constraints rather than approximate ones. Without the ablations and quantitative tables on large-gap cases, it is hard to judge how much the new components actually move the needle versus how much is post-hoc tuning. This is aimed at computer-vision groups working on sparse-view 3D from everyday video. It is worth sending to peer review because the problem is current, the architecture is described clearly, and the ideas are testable. A referee can check whether the consistency claims survive the circularity concern and whether the gains are robust across datasets.

Referee Report

1 major / 2 minor

Summary. The paper proposes AnyRecon, a scalable framework for arbitrary-view 3D reconstruction from unordered sparse inputs using a video diffusion model. It constructs a persistent global scene memory via a prepended capture-view cache, removes temporal compression to preserve frame-level correspondence, and introduces geometry-aware conditioning that couples generation and reconstruction through an explicit 3D geometric memory plus geometry-driven capture-view retrieval. Efficiency is achieved via 4-step diffusion distillation and context-window sparse attention. The abstract states that extensive experiments demonstrate robust reconstruction across irregular inputs, large viewpoint gaps, and long trajectories.

Significance. If the consistency and scalability claims hold, the work could meaningfully advance generative 3D reconstruction by supporting flexible conditioning cardinalities and explicit geometric control beyond the one- or two-frame limits of prior diffusion approaches. The explicit memory and retrieval mechanism, combined with distillation for efficiency, represents a potentially useful direction for handling casual captures with large viewpoint variations.

major comments (1)

[geometry-aware conditioning strategy] The geometry-driven capture-view retrieval (described in the abstract as coupling generation and reconstruction via explicit 3D geometric memory) presupposes sufficiently accurate initial 3D geometry to select relevant views, yet obtaining that geometry is the core reconstruction task from arbitrary sparse inputs. This creates a potential circularity risk for large viewpoint gaps, and the manuscript must clarify the bootstrapping procedure and provide targeted ablations demonstrating that retrieval improves rather than introduces inconsistencies.

minor comments (2)

[Abstract] Abstract contains grammatical issues: 'mitigates this issues' should read 'mitigate this issue'; 'Beyond better generative model' should read 'Beyond a better generative model'.
[Abstract] The abstract claims 'extensive experiments demonstrate robust and scalable reconstruction' but does not preview specific quantitative metrics (e.g., PSNR/SSIM on novel-view synthesis or reconstruction error) or baseline comparisons; these should be summarized to allow readers to gauge the magnitude of improvement.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and for identifying a key point that requires clarification in our manuscript. We address the major comment below and commit to revisions that strengthen the presentation of the geometry-aware conditioning strategy.

read point-by-point responses

Referee: [geometry-aware conditioning strategy] The geometry-driven capture-view retrieval (described in the abstract as coupling generation and reconstruction via explicit 3D geometric memory) presupposes sufficiently accurate initial 3D geometry to select relevant views, yet obtaining that geometry is the core reconstruction task from arbitrary sparse inputs. This creates a potential circularity risk for large viewpoint gaps, and the manuscript must clarify the bootstrapping procedure and provide targeted ablations demonstrating that retrieval improves rather than introduces inconsistencies.

Authors: We appreciate this observation on potential circularity. In AnyRecon the explicit 3D geometric memory is initialized from the sparse unordered input views themselves by first running a lightweight coarse reconstruction step (pose estimation and initial point cloud via a standard SfM pipeline such as COLMAP applied directly to the provided frames). This coarse geometry, while not final, is sufficient to drive the initial retrieval of relevant cached capture views for conditioning the diffusion model. As novel views are generated, the geometric memory is updated with the new 3D information, enabling progressive refinement and retrieval in subsequent steps. The process is therefore iterative rather than presupposing a complete accurate geometry upfront, which directly addresses large viewpoint gaps by bootstrapping from whatever inputs are available. We acknowledge that the current manuscript description of this bootstrapping procedure is insufficiently explicit and will revise the method section to provide a clear step-by-step account, including pseudocode for the initialization and update loop. In addition, we will add targeted ablation experiments that isolate the retrieval component (with vs. without geometry-driven selection) on sequences exhibiting large viewpoint changes, reporting metrics for geometric consistency and artifact reduction to demonstrate that retrieval improves rather than harms reconstruction quality. revision: yes

Circularity Check

0 steps flagged

No significant circularity: framework is explicitly constructed without self-referential reductions

full rationale

The paper proposes AnyRecon as a new scalable framework for arbitrary-view 3D reconstruction, describing explicit components such as a prepended capture view cache for persistent global scene memory, removal of temporal compression to maintain frame-level correspondence, and a geometry-aware conditioning strategy that couples generation and reconstruction through an explicit 3D geometric memory plus geometry-driven retrieval. No equations, derivations, fitted parameters presented as predictions, or self-citations that serve as load-bearing justifications appear in the provided text. The central claims rest on the construction of these architectural choices to address limitations in prior diffusion-based methods, without reducing to self-definitional loops, renamed empirical patterns, or ansatzes smuggled via prior self-work. The derivation chain is therefore self-contained as a proposed engineering solution.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the method builds on standard video diffusion models and geometric concepts from prior work.

pith-pipeline@v0.9.0 · 5524 in / 1136 out tokens · 36927 ms · 2026-05-10T02:00:17.114453+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 6 canonical work pages · 3 internal anchors

[1]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Bai,J., Xia, M., Fu, X.,Wang, X.,Mu, L., Cao, J.,Liu, Z., Hu, H., Bai, X., Wan, P., et al.: Recammaster: Camera-controlled generative rendering from a single video. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 14834–14844 (2025) 2, 4

2025
[2]

In: Proceedings of the SIGGRAPH Asia 2025 Conference Papers

Cao, C., Zhou, J., Li, S., Liang, J., Yu, C., Wang, F., Xue, X., Fu, Y.: Uni3c: Unifying precisely 3d-enhanced camera and human motion controls for video gen- eration. In: Proceedings of the SIGGRAPH Asia 2025 Conference Papers. pp. 1–12 (2025) 2, 4, 6, 11, 13

2025
[3]

arXiv preprint arXiv:2602.06028 (2026)

Chen, S., Wei, C., Sun, S., Nie, P., Zhou, K., Zhang, G., Yang, M.H., Chen, W.: Context forcing: Consistent autoregressive video generation with long context. arXiv preprint arXiv:2602.06028 (2026) 3

work page arXiv 2026
[4]

arXiv preprint arXiv:2506.10981 (2025) 4

Chen, W., Bi, J., Huang, Y., Zheng, W., Duan, Y.: Scenecompleter: Dense 3d scene completion for generative novel view synthesis. arXiv preprint arXiv:2506.10981 (2025) 4

work page arXiv 2025
[5]

In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR) (June 2022) 4

Deng, K., Liu, A., Zhu, J.Y., Ramanan, D.: Depth-supervised NeRF: Fewer views and faster training for free. In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR) (June 2022) 4

2022
[6]

Advances in Neural Information Processing Systems (2024) 2, 4

Gao*, R., Holynski*, A., Henzler, P., Brussee, A., Martin-Brualla, R., Srinivasan, P.P., Barron, J.T., Poole*, B.: Cat3d: Create anything in 3d with multi-view dif- fusion models. Advances in Neural Information Processing Systems (2024) 2, 4

2024
[7]

In: International Con- ference on Learning Representations (2022),https://openreview.net/forum?id= nZeVKeeFYf911

Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W.: LoRA: Low-rank adaptation of large language models. In: International Con- ference on Learning Representations (2022),https://openreview.net/forum?id= nZeVKeeFYf911

2022
[8]

ACM Trans

Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G., et al.: 3d gaussian splatting for real-time radiance field rendering. ACM Trans. Graph.42(4), 139–1 (2023) 1

2023
[9]

ACM Transactions on Graphics36(4) (2017) 12

Knapitsch, A., Park, J., Zhou, Q.Y., Koltun, V.: Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics36(4) (2017) 12

2017
[10]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Li, R., Torr, P., Vedaldi, A., Jakab, T.: Vmem: Consistent interactive video scene generation with surfel-indexed view memory. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 25690–25699 (2025) 3

2025
[11]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Ling, L., Sheng, Y., Tu, Z., Zhao, W., Xin, C., Wan, K., Yu, L., Guo, Q., Yu, Z., Lu, Y., et al.: Dl3dv-10k: A large-scale scene dataset for deep learning-based 3d vision. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 22160–22169 (2024) 10, 12, 14

2024
[12]

arXiv preprint arXiv:2512.21252 (2025) 2

Liu, J., Li, J., Deng, J., Li, G., Zhou, S., Fang, Z., Lao, S., Deng, Z., Zhu, J., Ma, T., et al.: Dreamontage: Arbitrary frame-guided one-shot video generation. arXiv preprint arXiv:2512.21252 (2025) 2

work page arXiv 2025
[13]

Liu, X., Chen, J., Kao, S.h., Tai, Y.W., Tang, C.K.: Deceptive-nerf: Enhancing nerf reconstruction using pseudo-observations from diffusion models (2023) 4

2023
[14]

Commu- nications of the ACM65(1), 99–106 (2021) 1

Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: Representing scenes as neural radiance fields for view synthesis. Commu- nications of the ACM65(1), 99–106 (2021) 1

2021
[15]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Niemeyer, M., Barron, J.T., Mildenhall, B., Sajjadi, M.S., Geiger, A., Radwan, N.: Regnerf: Regularizing neural radiance fields for view synthesis from sparse in- puts. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 5480–5490 (2022) 4 AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model 17

2022
[16]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Ren,X.,Shen,T.,Huang,J.,Ling,H.,Lu,Y.,Nimier-David,M.,Müller,T.,Keller, A., Fidler, S., Gao, J.: Gen3c: 3d-informed world-consistent video generation with precise camera control. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 6121–6132 (2025) 2, 4

2025
[17]

IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR (2023) 4

Truong, P., Rakotosaona, M.J., Manhardt, F., Tombari, F.: Sparf: Neural radiance fields from sparse and noisy poses. IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR (2023) 4

2023
[18]

Wan: Open and Advanced Large-Scale Video Generative Models

Wan, T., Wang, A., Ai, B., Wen, B., Mao, C., Xie, C.W., Chen, D., Yu, F., Zhao, H., Yang, J., Zeng, J., Wang, J., Zhang, J., Zhou, J., Wang, J., Chen, J., Zhu, K., Zhao, K., Yan, K., Huang, L., Feng, M., Zhang, N., Li, P., Wu, P., Chu, R., Feng, R., Zhang, S., Sun, S., Fang, T., Wang, T., Gui, T., Weng, T., Shen, T., Lin, W., Wang, W., Wang, W., Zhou, W.,...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[19]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Wang, J., Chen, M., Karaev, N., Vedaldi, A., Rupprecht, C., Novotny, D.: Vggt: Visual geometry grounded transformer. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 5294–5306 (2025) 4, 5

2025
[20]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Wang, Q., Zhang, Y., Holynski, A., Efros, A.A., Kanazawa, A.: Continuous 3d perception model with persistent state. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 10510–10522 (2025) 4

2025
[21]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Wang, S., Leroy, V., Cabon, Y., Chidlovskii, B., Revaud, J.: Dust3r: Geometric 3d vision made easy. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 20697–20709 (2024) 4

2024
[22]

$\pi^3$: Permutation-Equivariant Visual Geometry Learning

Wang, Y., Zhou, J., Zhu, H., Chang, W., Zhou, Y., Li, Z., Chen, J., Pang, J., Shen, C., He, T.:π 3: Permutation-equivariant visual geometry learning. arXiv preprint arXiv:2507.13347 (2025) 4, 5, 9, 11

work page internal anchor Pith review arXiv 2025
[23]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Wu, J.Z., Zhang, Y., Turki, H., Ren, X., Gao, J., Shou, M.Z., Fidler, S., Gojcic, Z., Ling, H.: Difix3d+: Improving 3d reconstructions with single-step diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 26024–26035 (2025) 2, 11, 13

2025
[24]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Wu, R., Mildenhall, B., Henzler, P., Park, K., Gao, R., Watson, D., Srinivasan, P.P., Verbin, D., Barron, J.T., Poole, B., et al.: Reconfusion: 3d reconstruction with diffusion priors. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 21551–21561 (2024) 2, 4

2024
[25]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Yang, J., Pavone, M., Wang, Y.: Freenerf: Improving few-shot neural rendering with free frequency regularization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 8254–8263 (2023) 4

2023
[26]

In: NeurIPS (2024) 7, 11

Yin, T., Gharbi, M., Park, T., Zhang, R., Shechtman, E., Durand, F., Freeman, W.T.: Improved distribution matching distillation for fast image synthesis. In: NeurIPS (2024) 7, 11

2024
[27]

In: CVPR (2024) 7

Yin, T., Gharbi, M., Zhang, R., Shechtman, E., Durand, F., Freeman, W.T., Park, T.: One-step diffusion with distribution matching distillation. In: CVPR (2024) 7

2024
[28]

In: Proceedings of the SIGGRAPH Asia 2025 Conference Papers

Yu, J., Bai, J., Qin, Y., Liu, Q., Wang, X., Wan, P., Zhang, D., Liu, X.: Context as memory: Scene-consistent interactive long video generation with memory retrieval. In: Proceedings of the SIGGRAPH Asia 2025 Conference Papers. pp. 1–11 (2025) 3

2025
[29]

ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis

Yu, W., Xing, J., Yuan, L., Hu, W., Li, X., Huang, Z., Gao, X., Wong, T.T., Shan, Y., Tian, Y.: Viewcrafter: Taming video diffusion models for high-fidelity novel view synthesis. arXiv preprint arXiv:2409.02048 (2024) 2, 4, 6, 11, 13 18 Y. Chen et al

work page internal anchor Pith review arXiv 2024
[30]

Advances in neural information processing systems35, 25018–25032 (2022) 4

Yu, Z., Peng, S., Niemeyer, M., Sattler, T., Geiger, A.: Monosdf: Exploring monoc- ular geometric cues for neural implicit surface reconstruction. Advances in neural information processing systems35, 25018–25032 (2022) 4

2022