pith. machine review for the scientific record. sign in

arxiv: 2604.20784 · v1 · submitted 2026-04-22 · 💻 cs.CV

Recognition: unknown

GeoRect4D: Geometry-Compatible Generative Rectification for Dynamic Sparse-View 3D Reconstruction

Authors on Pith no claims yet

Pith reviewed 2026-05-10 00:50 UTC · model grok-4.3

classification 💻 cs.CV
keywords dynamic 3D reconstructionsparse viewgenerative modelsdiffusion rectifier3D Gaussian splattingspatiotemporal attention
0
0 comments X

The pith

GeoRect4D uses structural locking in a diffusion rectifier to couple generative detail with explicit 3D geometry for consistent dynamic sparse-view reconstruction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that dynamic 3D scenes can be reconstructed from sparse multi-view videos by feeding degradation information back into an anchor-based 3D Gaussian splatting model refined by a single-step diffusion process. This matters because standard generative approaches introduce drift and inconsistency when trying to fill in missing views, while pure geometric methods fail to hallucinate plausible details. The framework closes the loop with progressive purification and distillation steps. A sympathetic reader would care because it promises better 4D modeling for applications like AR and robotics where camera coverage is limited.

Core claim

GeoRect4D proposes a unified framework that couples explicit 3D consistency with generative refinement through a closed-loop optimization. It employs a robust anchor-based dynamic 3DGS substrate combined with a single-step diffusion rectifier featuring structural locking and spatiotemporal coordinated attention to preserve plausibility while restoring missing content, followed by stochastic geometric purification and generative distillation to eliminate artifacts and add textures.

What carries the argument

The structural locking mechanism and spatiotemporal coordinated attention in the single-step diffusion rectifier, which anchors the generative process to the explicit 3D geometry to prevent drift while allowing detail hallucination.

If this is right

  • Eliminates geometric collapse, trajectory drift, and floating artifacts in sparse dynamic reconstructions.
  • Achieves higher reconstruction fidelity and perceptual quality than prior methods.
  • Maintains spatiotemporal consistency across multiple datasets.
  • Enables effective use of generative priors without mismatch to 3D geometry.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This locking strategy might apply to other domains requiring constrained generation, such as video prediction or 3D object completion.
  • Testing on real captured sparse videos rather than synthetic ones could reveal practical limits.
  • The single-step nature suggests potential for faster inference compared to multi-step diffusion methods in similar tasks.

Load-bearing premise

The structural locking mechanism and spatiotemporal coordinated attention can hallucinate missing content while keeping physical plausibility and avoiding structural drift or temporal inconsistency.

What would settle it

Running the reconstruction on a dataset with extremely sparse views and observing whether the output 3D models exhibit structural drift or temporal inconsistencies compared to ground truth would falsify the claim if such issues persist.

Figures

Figures reproduced from arXiv: 2604.20784 by Hua Yang, Qiang Hu, Qianhe Wang, Wenjun Zhang, Xiaoyun Zhang, Xuanxuan Wang, Zhenlong Wu, Zihan Zheng.

Figure 1
Figure 1. Figure 1: Left: Sparse dynamic reconstruction suffers from geometric blur, while naive priors induce structural artifacts. Middle: Our closed-loop framework couples explicit dynamic modeling with a generative prior for progressive rectification. Right: Geo￾Rect4D achieves superior visual fidelity and perceptual quality over other methods. Abstract. Reconstructing dynamic 3D scenes from sparse multi-view videos is hi… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of GeoRect4D. Left: Sparse inputs are adaptively decomposed into a static base and an anchor-controlled dynamic field to construct the base dynamic 3DGS model. Top Right: A degradation-aware, single-step diffusion prior synthesizes high￾fidelity rectified views from coarse renderings. Bottom Right: A two-stage progressive optimization framework first applies geometric purification to stabilize the… view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of our two-stage progressive optimization framework. Stage 1 exe￾cutes geometric purification via stochastic pruning, opacity annealing, and ROI-based dynamic constraints to establish a robust substrate. Stage 2 performs generative distil￾lation, transferring details from the generative prior into the 3D scene through a hybrid objective combining physical and pseudo-supervision. subsequently p… view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative comparison of GeoRect4D with STGS [32], Swift4D [63] , and Ex4DGS [27] across the N3DV [31], MeetRoom [30], and MPEG datasets. sequently, STGS suffers from geometric collapse, while Swift4D and Ex4DGS exhibit pervasive blurring and ghosting. Compared to these baselines, our Geo￾Rect4D more effectively preserves finer structural boundaries and high-frequency details. Specifically, our method suc… view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative results of GeoRect4D and its variants on the MPEG dataset. Ex￾cluding any module leads to severe blurring and structural artifacts. increasing the tOF error from 1.412 to 1.462. Consistent with these numerical declines, [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
read the original abstract

Reconstructing dynamic 3D scenes from sparse multi-view videos is highly ill-posed, often leading to geometric collapse, trajectory drift, and floating artifacts. Recent attempts introduce generative priors to hallucinate missing content, yet naive integration frequently causes structural drift and temporal inconsistency due to the mismatch between stochastic 2D generation and deterministic 3D geometry. In this paper, we propose GeoRect4D, a novel unified framework for sparse-view dynamic reconstruction that couples explicit 3D consistency with generative refinement via a closed-loop optimization process. Specifically, GeoRect4D introduces a degradation-aware feedback mechanism that incorporates a robust anchor-based dynamic 3DGS substrate with a single-step diffusion rectifier to hallucinate high-fidelity details. This rectifier utilizes a structural locking mechanism and spatiotemporal coordinated attention, effectively preserving physical plausibility while restoring missing content. Furthermore, we present a progressive optimization strategy that employs stochastic geometric purification to eliminate floaters and generative distillation to infuse texture details into the explicit representation. Extensive experiments demonstrate that GeoRect4D achieves state-of-the-art performance in reconstruction fidelity, perceptual quality, and spatiotemporal consistency across multiple datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript proposes GeoRect4D, a unified framework for reconstructing dynamic 3D scenes from sparse multi-view videos. It couples an explicit anchor-based dynamic 3D Gaussian Splatting (3DGS) substrate with a single-step diffusion rectifier that uses a structural locking mechanism and spatiotemporal coordinated attention to hallucinate missing content while preserving physical plausibility. A progressive optimization strategy combines stochastic geometric purification to remove floaters with generative distillation to infuse texture details into the explicit representation. The central claim is that this closed-loop coupling resolves the mismatch between stochastic 2D generation and deterministic 3D geometry, yielding state-of-the-art reconstruction fidelity, perceptual quality, and spatiotemporal consistency across multiple datasets.

Significance. If the empirical claims hold, the work would represent a meaningful advance in dynamic sparse-view reconstruction by demonstrating a practical geometry-compatible integration of generative priors. The closed-loop design with explicit structural locking and coordinated attention offers a concrete mechanism for avoiding drift and inconsistency, which could influence subsequent hybrid explicit-implicit pipelines. The progressive optimization strategy also provides a reusable template for balancing geometric fidelity with generative detail infusion.

minor comments (2)
  1. The abstract asserts SOTA performance but does not preview any quantitative metrics, baseline comparisons, or dataset details; adding a single sentence with key numbers (e.g., PSNR, LPIPS, or temporal consistency scores) would improve immediate readability.
  2. The terms 'structural locking mechanism' and 'spatiotemporal coordinated attention' are introduced without a forward reference to their precise definitions or algorithmic pseudocode; a brief parenthetical pointer to the relevant subsection in §3 would clarify the architecture for readers.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive evaluation and recommendation of minor revision. The referee's summary accurately reflects the technical contributions of GeoRect4D, particularly the closed-loop integration of the anchor-based 3DGS substrate with the single-step diffusion rectifier via structural locking and spatiotemporal coordinated attention, as well as the progressive optimization combining stochastic geometric purification and generative distillation.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper describes a new architectural framework (anchor-based 3DGS substrate + single-step diffusion rectifier with structural locking and spatiotemporal attention + progressive optimization) without presenting equations, derivations, or parameter-fitting steps that reduce any claimed result to its own inputs by construction. No self-definitional loops, fitted inputs renamed as predictions, or load-bearing self-citations appear in the abstract or described method. The central claims are empirical performance assertions to be tested on datasets, not mathematical identities derived from prior outputs of the same system.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The approach rests on standard assumptions from 3D Gaussian splatting and diffusion-based generation without providing independent validation for the new mechanisms introduced.

axioms (2)
  • domain assumption 3D Gaussian Splatting can serve as a robust explicit substrate for dynamic scenes
    Invoked as the base representation in the feedback mechanism.
  • domain assumption Generative diffusion models can be guided to produce geometrically consistent details
    Core premise of the single-step rectifier.
invented entities (2)
  • structural locking mechanism no independent evidence
    purpose: Preserve physical plausibility during generative hallucination
    Introduced as part of the rectifier to prevent drift.
  • spatiotemporal coordinated attention no independent evidence
    purpose: Maintain consistency across space and time
    New attention component in the rectifier.

pith-pipeline@v0.9.0 · 5523 in / 1372 out tokens · 38042 ms · 2026-05-10T00:50:57.108615+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

78 extracted references · 6 canonical work pages

  1. [1]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Attal, B., Huang, J.B., Richardt, C., Zollhoefer, M., Kopf, J., O’Toole, M., Kim, C.: Hyperreel: High-fidelity 6-dof video with ray-conditioned sampling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 16610–16620 (2023)

  2. [2]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Barron, J.T., Mildenhall, B., Tancik, M., Hedman, P., Martin-Brualla, R., Srini- vasan, P.P.: Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 5855–5864 (2021)

  3. [3]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: Mip- nerf 360: Unbounded anti-aliased neural radiance fields. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 5470–5479 (2022)

  4. [4]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: Zip-nerf: Anti-aliased grid-based neural radiance fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 19697–19705 (2023)

  5. [5]

    In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion

    Cao, A., Johnson, J.: Hexplane: A fast representation for dynamic scenes. In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion. pp. 130–141 (2023)

  6. [6]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Charatan,D.,Li,S.L.,Tagliasacchi,A.,Sitzmann,V.:pixelsplat:3dgaussiansplats from image pairs for scalable generalizable 3d reconstruction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 19457– 19467 (2024)

  7. [7]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Chen, Y., Lee, G.H.: Dbarf: Deep bundle-adjusting generalizable neural radiance fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 24–34 (2023)

  8. [8]

    In: European Conference on Computer Vision

    Chen, Y., Xu, H., Zheng, C., Zhuang, B., Pollefeys, M., Geiger, A., Cham, T.J., Cai, J.: Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images. In: European Conference on Computer Vision. pp. 370–386. Springer (2024)

  9. [9]

    ACM Transactions on Graphics (TOG)39(4), 75–1 (2020)

    Chu, M., Xie, Y., Mayer, J., Leal-Taixé, L., Thuerey, N.: Learning temporal co- herence via self-supervision for gan-based video generation. ACM Transactions on Graphics (TOG)39(4), 75–1 (2020)

  10. [10]

    In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition

    Chung, J., Oh, J., Lee, K.M.: Depth-regularized optimization for 3d gaussian splat- ting in few-shot images. In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition. pp. 811–820 (2024)

  11. [11]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Deng,K.,Liu,A.,Zhu,J.Y.,Ramanan,D.:Depth-supervisednerf:Fewerviewsand faster training for free. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 12882–12891 (2022)

  12. [12]

    In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV)

    Du, Y., Zhang, Y., Yu, H.X., Tenenbaum, J.B., Wu, J.: Neural radiance flow for 4d view synthesis and video processing. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV). pp. 14304–14314. IEEE Computer Society (2021)

  13. [13]

    In: Proceedings of the Computer Vision and Pattern Recognition Con- ference

    Feng, G., Chen, S., Fu, R., Liao, Z., Wang, Y., Liu, T., Hu, B., Xu, L., Pei, Z., Li, H., et al.: Flashgs: Efficient 3d gaussian splatting for large-scale and high-resolution rendering. In: Proceedings of the Computer Vision and Pattern Recognition Con- ference. pp. 26652–26662 (2025)

  14. [14]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Fridovich-Keil, S., Meanti, G., Warburg, F.R., Recht, B., Kanazawa, A.: K-planes: Explicit radiance fields in space, time, and appearance. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12479– 12488 (2023) 16 Z. Wu et al

  15. [15]

    Advances in Neural Information Processing Systems37, 80609–80633 (2024)

    Gao, Q., Meng, J., Wen, C., Chen, J., Zhang, J.: Hicom: Hierarchical coherent motion for dynamic streamable scenes with 3d gaussian splatting. Advances in Neural Information Processing Systems37, 80609–80633 (2024)

  16. [16]

    Cat3d: Create any- thing in 3d with multi-view diffusion models,

    Gao, R., Holynski, A., Henzler, P., Brussee, A., Martin-Brualla, R., Srinivasan, P., Barron, J.T., Poole, B.: Cat3d: Create anything in 3d with multi-view diffusion models. arXiv preprint arXiv:2405.10314 (2024)

  17. [17]

    IEEE Transactions on Circuits and Systems for Video Technology (2024)

    Guo, Z., Zhou, W., Li, L., Wang, M., Li, H.: Motion-aware 3d gaussian splatting for efficient dynamic scene reconstruction. IEEE Transactions on Circuits and Systems for Video Technology (2024)

  18. [18]

    Advances in Neural Information Pro- cessing Systems37, 68595–68621 (2024)

    Han, L., Zhou, J., Liu, Y.S., Han, Z.: Binocular-guided 3d gaussian splatting with view consistency for sparse view synthesis. Advances in Neural Information Pro- cessing Systems37, 68595–68621 (2024)

  19. [19]

    In: Proceedings of the IEEE/CVF Inter- national Conference on Computer Vision

    Höllein,L.,Božič,A.,Zollhöfer,M.,Nießner,M.:3dgs-lm:Fastergaussian-splatting optimization with levenberg-marquardt. In: Proceedings of the IEEE/CVF Inter- national Conference on Computer Vision. pp. 26740–26750 (2025)

  20. [20]

    In: Proceedings of the 33rd ACM International Conference on Multimedia

    Hu, D., Zhou, Y., Huang, X., Yin, H., Li, Z.: Sparse4dgs: Flow-geometry assisted 4d gaussian splatting for dynamic sparse view synthesis. In: Proceedings of the 33rd ACM International Conference on Multimedia. pp. 10642–10651 (2025)

  21. [21]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Hu, Q., Zheng, Z., Zhong, H., Fu, S., Song, L., Zhang, X., Zhai, G., Wang, Y.: 4dgc: Rate-aware 4d gaussian compression for efficient streamable free-viewpoint video. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 875–885 (2025)

  22. [22]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Huang, Y.H., Sun, Y.T., Yang, Z., Lyu, X., Cao, Y.P., Qi, X.: Sc-gs: Sparse- controlled gaussian splatting for editable dynamic scenes. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 4220–4230 (2024)

  23. [23]

    ACM Transactions on Graphics (TOG)42(4), 1–12 (2023)

    Işık, M., Rünz, M., Georgopoulos, M., Khakhulin, T., Starck, J., Agapito, L., Nießner, M.: Humanrf: High-fidelity neural radiance fields for humans in motion. ACM Transactions on Graphics (TOG)42(4), 1–12 (2023)

  24. [24]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Jain, A., Tancik, M., Abbeel, P.: Putting nerf on a diet: Semantically consistent few-shot view synthesis. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 5885–5894 (2021)

  25. [25]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Ke, J., Wang, Q., Wang, Y., Milanfar, P., Yang, F.: Musiq: Multi-scale image quality transformer. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 5148–5157 (2021)

  26. [26]

    ACM Trans

    Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3d gaussian splatting for real-time radiance field rendering. ACM Trans. Graph.42(4), 139–1 (2023)

  27. [27]

    Advances in Neural Information Processing Systems37, 5384–5409 (2024)

    Lee, J., Won, C., Jung, H., Bae, I., Jeon, H.G.: Fully explicit dynamic gaus- sian splatting. Advances in Neural Information Processing Systems37, 5384–5409 (2024)

  28. [28]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Li, H., Li, S., Gao, X., Batuer, A., Yu, L., Liao, Y.: Gifstream: 4d gaussian-based immersive video with feature stream. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 21761–21770 (2025)

  29. [29]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Li, J., Zhang, J., Bai, X., Zheng, J., Ning, X., Zhou, J., Gu, L.: Dngaussian: Opti- mizing sparse-view 3d gaussian radiance fields with global-local depth normaliza- tion. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 20775–20785 (2024)

  30. [30]

    Advances in Neural Information Processing Systems35, 13485–13498 (2022) GeoRect4D for Dynamic Sparse-View 3D Reconstruction 17

    Li, L., Shen, Z., Wang, Z., Shen, L., Tan, P.: Streaming radiance fields for 3d video synthesis. Advances in Neural Information Processing Systems35, 13485–13498 (2022) GeoRect4D for Dynamic Sparse-View 3D Reconstruction 17

  31. [31]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Li, T., Slavcheva, M., Zollhoefer, M., Green, S., Lassner, C., Kim, C., Schmidt, T., Lovegrove, S., Goesele, M., Newcombe, R., et al.: Neural 3d video synthesis from multi-view video. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 5521–5531 (2022)

  32. [32]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Li, Z., Chen, Z., Li, Z., Xu, Y.: Spacetime gaussian feature splatting for real- time dynamic view synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 8508–8520 (2024)

  33. [33]

    In: Proceedings of the IEEE/CVF inter- national conference on computer vision

    Liu, R., Wu, R., Van Hoorick, B., Tokmakov, P., Zakharov, S., Vondrick, C.: Zero- 1-to-3: Zero-shot one image to 3d object. In: Proceedings of the IEEE/CVF inter- national conference on computer vision. pp. 9298–9309 (2023)

  34. [34]

    Advances in Neural Information Processing Systems37, 133305–133327 (2024)

    Liu, X., Zhou, C., Huang, S.: 3dgs-enhancer: Enhancing unbounded 3d gaussian splatting with view-consistent 2d diffusion priors. Advances in Neural Information Processing Systems37, 133305–133327 (2024)

  35. [35]

    Ben Mildenhall, Pratul P

    Liu, X., Chen, J., Kao, S.h., Tai, Y.W., Tang, C.K.: Deceptive-nerf/3dgs: Diffusion- generated pseudo-observations for high-quality sparse-view reconstruction. arXiv preprint arXiv:2305.15171 (2023)

  36. [36]

    In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion

    Lu, Z., Guo, X., Hui, L., Chen, T., Yang, M., Tang, X., Zhu, F., Dai, Y.: 3d geometry-aware deformable gaussian splatting for dynamic view synthesis. In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion. pp. 8900–8910 (2024)

  37. [37]

    In: 2024 International Conference on 3D Vision (3DV)

    Luiten, J., Kopanas, G., Leibe, B., Ramanan, D.: Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis. In: 2024 International Conference on 3D Vision (3DV). pp. 800–809. IEEE (2024)

  38. [38]

    Commu- nications of the ACM65(1), 99–106 (2021)

    Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: Representing scenes as neural radiance fields for view synthesis. Commu- nications of the ACM65(1), 99–106 (2021)

  39. [39]

    In: Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition

    Niedermayr, S., Stumpfegger, J., Westermann, R.: Compressed 3d gaussian splat- ting for accelerated novel view synthesis. In: Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition. pp. 10349–10358 (2024)

  40. [40]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Niemeyer, M., Barron, J.T., Mildenhall, B., Sajjadi, M.S., Geiger, A., Radwan, N.: Regnerf: Regularizing neural radiance fields for view synthesis from sparse in- puts. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 5480–5490 (2022)

  41. [41]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Park, H., Ryu, G., Kim, W.: Dropgaussian: Structural regularization for sparse- view gaussian splatting. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 21600–21609 (2025)

  42. [42]

    Acm Transactions on Graphics (TOG)42(6), 1–11 (2023)

    Park, K., Henzler, P., Mildenhall, B., Barron, J.T., Martin-Brualla, R.: Camp: Camera preconditioning for neural radiance fields. Acm Transactions on Graphics (TOG)42(6), 1–11 (2023)

  43. [43]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Park, K., Sinha, U., Barron, J.T., Bouaziz, S., Goldman, D.B., Seitz, S.M., Martin- Brualla, R.: Nerfies: Deformable neural radiance fields. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 5865–5874 (2021)

  44. [44]

    Park,S.,Son,M.,Jang,S.,Ahn,Y.C.,Kim,J.Y.,Kang,N.:Temporalinterpolation isallyouneedfordynamicneuralradiancefields.In:ProceedingsoftheIEEE/CVF conference on computer vision and pattern recognition. pp. 4212–4221 (2023)

  45. [45]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Pumarola, A., Corona, E., Pons-Moll, G., Moreno-Noguer, F.: D-nerf: Neural ra- diance fields for dynamic scenes. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 10318–10327 (2021)

  46. [46]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Qi, L., Kuen, J., Gu, J., Lin, Z., Wang, Y., Chen, Y., Li, Y., Jia, J.: Multi-scale aligned distillation for low-resolution detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 14443–14453 (2021) 18 Z. Wu et al

  47. [47]

    In: The Thirteenth International Conference on Learning Representations (2025)

    Qingming, L., Liu, Y., Wang, J., Lyu, X., Wang, P., Wang, W., Hou, J.: Modgs: Dynamic gaussian splatting from casually-captured monocular videos with depth priors. In: The Thirteenth International Conference on Learning Representations (2025)

  48. [48]

    ACM Transactions on Graphics (TOG)43(4), 1–17 (2024)

    Radl, L., Steiner, M., Parger, M., Weinrauch, A., Kerbl, B., Steinberger, M.: Stopthepop: Sorted gaussian splatting for view-consistent real-time rendering. ACM Transactions on Graphics (TOG)43(4), 1–17 (2024)

  49. [49]

    ACM Transactions on Graphics (ToG)42(4), 1–12 (2023)

    Reiser, C., Szeliski, R., Verbin, D., Srinivasan, P., Mildenhall, B., Geiger, A., Bar- ron, J., Hedman, P.: Merf: Memory-efficient radiance fields for real-time view syn- thesis in unbounded scenes. ACM Transactions on Graphics (ToG)42(4), 1–12 (2023)

  50. [50]

    In: ProceedingsoftheIEEE/CVFConferenceonComputerVisionandPatternRecog- nition

    Shao, R., Zheng, Z., Tu, H., Liu, B., Zhang, H., Liu, Y.: Tensor4d: Efficient neu- ral 4d decomposition for high-fidelity dynamic reconstruction and rendering. In: ProceedingsoftheIEEE/CVFConferenceonComputerVisionandPatternRecog- nition. pp. 16632–16642 (2023)

  51. [51]

    arXiv preprint arXiv:2511.07122 (2025)

    Shi, C., Yang, C., Hu, X., Chen, M., Pan, W., Yang, Y., Ding, J., Yu, Z., Yu, J.: Sparse4dgs: 4d gaussian splatting for sparse-frame dynamic scene reconstruction. arXiv preprint arXiv:2511.07122 (2025)

  52. [52]

    Song, J., Park, S., An, H., Cho, S., Kwak, M.S., Cho, S., Kim, S.: Därf: Boosting radiancefieldsfromsparseinputviewswithmonoculardepthadaptation.Advances in Neural Information Processing Systems36, 68458–68470 (2023)

  53. [53]

    IEEE Transactions on Visualization and Computer Graphics29(5), 2732–2742 (2023)

    Song, L., Chen, A., Li, Z., Chen, Z., Chen, L., Yuan, J., Xu, Y., Geiger, A.: Nerf- player: A streamable dynamic scene representation with decomposed neural radi- ance fields. IEEE Transactions on Visualization and Computer Graphics29(5), 2732–2742 (2023)

  54. [54]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Sun, J., Jiao, H., Li, G., Zhang, Z., Zhao, L., Xing, W.: 3dgstream: On-the-fly training of 3d gaussians for efficient streaming of photo-realistic free-viewpoint videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 20675–20685 (2024)

  55. [55]

    In: European conference on computer vision

    Teed, Z., Deng, J.: Raft: Recurrent all-pairs field transforms for optical flow. In: European conference on computer vision. pp. 402–419. Springer (2020)

  56. [56]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Truong, P., Rakotosaona, M.J., Manhardt, F., Tombari, F.: Sparf: Neural radiance fields from sparse and noisy poses. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4190–4200 (2023)

  57. [57]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Wang, F., Tan, S., Li, X., Tian, Z., Song, Y., Liu, H.: Mixed neural voxels for fast multi-view video synthesis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 19706–19716 (2023)

  58. [58]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Wang, G., Chen, Z., Loy, C.C., Liu, Z.: Sparsenerf: Distilling depth ranking for few-shot novel view synthesis. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 9065–9076 (2023)

  59. [59]

    In: Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Wang,L.,Zhang,J.,Liu,X.,Zhao,F.,Zhang,Y.,Zhang,Y.,Wu,M.,Yu,J.,Xu,L.: Fourier plenoctrees for dynamic radiance field rendering in real-time. In: Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 13524–13534 (2022)

  60. [60]

    IEEE transactions on image processing 13(4), 600–612 (2004)

    Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13(4), 600–612 (2004)

  61. [61]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Wu, G., Yi, T., Fang, J., Xie, L., Zhang, X., Wei, W., Liu, W., Tian, Q., Wang, X.: 4d gaussian splatting for real-time dynamic scene rendering. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 20310– 20320 (2024) GeoRect4D for Dynamic Sparse-View 3D Reconstruction 19

  62. [62]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Wu, J.Z., Zhang, Y., Turki, H., Ren, X., Gao, J., Shou, M.Z., Fidler, S., Gojcic, Z., Ling, H.: Difix3d+: Improving 3d reconstructions with single-step diffusion models. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 26024–26035 (2025)

  63. [63]

    Swift4d: Adaptive divide-and-conquer gaussian splatting for compact and efficient reconstruction of dynamic scene,

    Wu, J., Peng, R., Wang, Z., Xiao, L., Tang, L., Yan, J., Xiong, K., Wang, R.: Swift4d: Adaptive divide-and-conquer gaussian splatting for compact and efficient reconstruction of dynamic scene. arXiv preprint arXiv:2503.12307 (2025)

  64. [64]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Wu, R., Mildenhall, B., Henzler, P., Park, K., Gao, R., Watson, D., Srinivasan, P.P., Verbin, D., Barron, J.T., Poole, B., et al.: Reconfusion: 3d reconstruction with diffusion priors. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 21551–21561 (2024)

  65. [65]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Xu, H., Peng, S., Wang, F., Blum, H., Barath, D., Geiger, A., Pollefeys, M.: Depth- splat: Connecting gaussian splatting and depth. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 16453–16463 (2025)

  66. [66]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Xu, Y., Wang, L., Chen, M., Ao, S., Li, L., Guo, Y.: Dropoutgs: Dropping out gaussians for better sparse-view rendering. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 701–710 (2025)

  67. [67]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Yang, J., Pavone, M., Wang, Y.: Freenerf: Improving few-shot neural rendering with free frequency regularization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 8254–8263 (2023)

  68. [68]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition

    Yang, S., Wu, T., Shi, S., Lao, S., Gong, Y., Cao, M., Wang, J., Yang, Y.: Maniqa: Multi-dimension attention network for no-reference image quality assessment. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition. pp. 1191–1200 (2022)

  69. [69]

    Real-time photorealistic dynamic scene representation and rendering with 4d gaussian splatting.arXiv preprint arXiv:2310.10642, 2023

    Yang, Z., Yang, H., Pan, Z., Zhang, L.: Real-time photorealistic dynamic scene representation and rendering with 4d gaussian splatting. arXiv preprint arXiv:2310.10642 (2023)

  70. [70]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Yang, Z., Gao, X., Zhou, W., Jiao, S., Zhang, Y., Jin, X.: Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 20331– 20341 (2024)

  71. [71]

    In: Proceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition

    Yu, Z., Chen, A., Huang, B., Sattler, T., Geiger, A.: Mip-splatting: Alias-free 3d gaussian splatting. In: Proceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition. pp. 19447–19456 (2024)

  72. [72]

    Advances in neural information processing systems35, 25018–25032 (2022)

    Yu, Z., Peng, S., Niemeyer, M., Sattler, T., Geiger, A.: Monosdf: Exploring monoc- ular geometric cues for neural implicit surface reconstruction. Advances in neural information processing systems35, 25018–25032 (2022)

  73. [73]

    In: European Conference on Computer Vision

    Zhang, J., Li, J., Yu, X., Huang, L., Gu, L., Zheng, J., Bai, X.: Cor-gs: sparse-view 3d gaussian splatting via co-regularization. In: European Conference on Computer Vision. pp. 335–352. Springer (2024)

  74. [74]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 586–595 (2018)

  75. [75]

    arXiv preprint arXiv:2509.17513 (2025)

    Zheng, Z., Wu, Z., Zhong, H., Tian, Y., Cao, N., Xu, L., Yao, J., Zhang, X., Hu, Q., Zhang, W.: 4dgcpro: Efficient hierarchical 4d gaussian compression for progressive volumetric video streaming. arXiv preprint arXiv:2509.17513 (2025)

  76. [76]

    In: Proceedings of the 32nd ACM International Conference on Multimedia

    Zheng, Z., Zhong, H., Hu, Q., Zhang, X., Song, L., Zhang, Y., Wang, Y.: Hpc: Hierarchical progressive coding framework for volumetric video. In: Proceedings of the 32nd ACM International Conference on Multimedia. pp. 7937–7946 (2024) 20 Z. Wu et al

  77. [77]

    In: 2024 IEEE International Conference on Image Processing (ICIP)

    Zheng, Z., Zhong, H., Hu, Q., Zhang, X., Song, L., Zhang, Y., Wang, Y.: Jointrf: end-to-end joint optimization for dynamic neural radiance field representation and compression. In: 2024 IEEE International Conference on Image Processing (ICIP). pp. 3292–3298. IEEE (2024)

  78. [78]

    In: European conference on computer vision

    Zhu, Z., Fan, Z., Jiang, Y., Wang, Z.: Fsgs: Real-time few-shot view synthesis using gaussian splatting. In: European conference on computer vision. pp. 145–