Recognition: 2 theorem links
· Lean TheoremMoCam: Unified Novel View Synthesis via Structured Denoising Dynamics
Pith reviewed 2026-05-14 21:53 UTC · model grok-4.3
The pith
MoCam unifies novel view synthesis by switching from geometric priors early in diffusion denoising to appearance priors later to correct errors.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MoCam employs structured denoising dynamics to orchestrate a coordinated progression from geometry to appearance within the diffusion process: it first leverages geometric priors in early stages to anchor coarse structures and tolerate their incompleteness, then switches to appearance priors in later stages to actively correct geometric errors and refine details. This design naturally unifies static and dynamic view synthesis by temporally decoupling geometric alignment and appearance refinement within the diffusion process.
What carries the argument
Structured denoising dynamics that temporally decouple geometric alignment from appearance refinement inside the diffusion process.
If this is right
- MoCam significantly outperforms prior methods particularly when point clouds contain severe holes or distortions.
- It achieves robust geometry-appearance disentanglement.
- The method tolerates incompleteness in geometric priors by using them only for initial anchoring.
- It provides a natural unification for both static and dynamic novel view synthesis tasks.
Where Pith is reading between the lines
- The staged prior switch could be tested on other generative tasks that combine sparse 3D structure with dense image cues.
- This suggests potential gains in real-world capture pipelines where input geometry is often incomplete.
- Further experiments on video sequences could check whether the temporal decoupling extends to motion without introducing temporal artifacts.
Load-bearing premise
That switching from geometric priors in early diffusion stages to appearance priors in later stages will actively correct geometric errors without introducing new inconsistencies or artifacts.
What would settle it
Apply the method to point clouds with deliberately added large holes and compare the synthesized novel views against ground-truth geometry and appearance metrics to check whether errors are corrected rather than propagated.
Figures
read the original abstract
Generative novel view synthesis faces a fundamental dilemma: geometric priors provide spatial alignment but become sparse and inaccurate under view changes, while appearance priors offer visual fidelity but lack geometric correspondence. Existing methods either propagate geometric errors throughout generation or suffer from signal conflicts when fusing both statically. We introduce MoCam, which employs structured denoising dynamics to orchestrate a coordinated progression from geometry to appearance within the diffusion process. MoCam first leverages geometric priors in early stages to anchor coarse structures and tolerate their incompleteness, then switches to appearance priors in later stages to actively correct geometric errors and refine details. This design naturally unifies static and dynamic view synthesis by temporally decoupling geometric alignment and appearance refinement within the diffusion process. Experiments demonstrate that MoCam significantly outperforms prior methods, particularly when point clouds contain severe holes or distortions, achieving robust geometry-appearance disentanglement.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes MoCam, a diffusion-based framework for novel view synthesis that employs structured denoising dynamics to temporally decouple geometric and appearance priors: geometric priors anchor coarse structure in early denoising stages while appearance priors are introduced later to refine details and correct errors arising from incomplete or distorted input point clouds. The approach is presented as unifying static and dynamic view synthesis through this staged orchestration within the diffusion process, with claims of significant outperformance over prior methods particularly under severe geometric degradation.
Significance. If the staged prior-switching mechanism can be shown to enable genuine error correction rather than simple texture overlay, the work would offer a practical advance for novel view synthesis on real-world data with noisy or incomplete geometry, reducing reliance on perfect point-cloud inputs and providing a unified treatment of static and dynamic scenes.
major comments (2)
- [Abstract / Experiments] Abstract and Experiments section: the central claim of significant outperformance and robust geometry-appearance disentanglement is unsupported because no quantitative metrics, baselines, ablation studies, or experimental details are supplied, leaving the reader unable to assess whether the reported gains are real or attributable to the proposed dynamics.
- [Method] Method description of structured denoising dynamics: the claim that appearance priors in later stages actively correct geometric errors (rather than merely masking them) is load-bearing for the contribution, yet no timestep-specific analysis, error maps, or ablation on switch timing is provided to isolate correction from propagation of artifacts or new inconsistencies.
Simulated Author's Rebuttal
We thank the referee for their thorough and constructive review. The feedback highlights important areas where additional evidence and analysis will strengthen the manuscript. We address each major comment point by point below.
read point-by-point responses
-
Referee: [Abstract / Experiments] Abstract and Experiments section: the central claim of significant outperformance and robust geometry-appearance disentanglement is unsupported because no quantitative metrics, baselines, ablation studies, or experimental details are supplied, leaving the reader unable to assess whether the reported gains are real or attributable to the proposed dynamics.
Authors: We agree that the current manuscript version does not provide sufficient quantitative details to fully support the claims in the abstract. In the revision we will expand the Experiments section with quantitative metrics (PSNR, SSIM, LPIPS), comparisons to relevant baselines on standard benchmarks, and ablation studies on the staged denoising components. These additions will allow readers to evaluate the reported gains, especially under severe geometric degradation. revision: yes
-
Referee: [Method] Method description of structured denoising dynamics: the claim that appearance priors in later stages actively correct geometric errors (rather than merely masking them) is load-bearing for the contribution, yet no timestep-specific analysis, error maps, or ablation on switch timing is provided to isolate correction from propagation of artifacts or new inconsistencies.
Authors: We acknowledge that the active correction claim requires stronger empirical isolation. In the revised manuscript we will include timestep-specific visualizations, geometric error maps across denoising stages, and an ablation on the geometry-to-appearance switch timing. These elements will demonstrate that later-stage appearance priors reduce errors from incomplete point clouds rather than merely overlaying details or introducing new inconsistencies. revision: yes
Circularity Check
No significant circularity in MoCam's derivation chain
full rationale
The paper introduces MoCam via a design choice of temporally decoupling geometric priors (early diffusion stages) from appearance priors (later stages) within structured denoising dynamics. This orchestration is presented as an independent mechanism to unify static and dynamic novel view synthesis and tolerate point-cloud holes, without any equations or self-citations that reduce the central claim to fitted inputs, self-definitions, or prior author results by construction. The abstract and described method contain no load-bearing steps where a 'prediction' collapses to a renamed fit or where uniqueness is imported from overlapping citations. The derivation remains self-contained as a proposed scheduling strategy whose validity is asserted through experimental comparison rather than tautological reduction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Diffusion models allow effective conditioning on different priors at successive denoising stages
invented entities (1)
-
structured denoising dynamics
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
MoCam first leverages geometric priors in early stages to anchor coarse structures ... then switches to appearance priors in later stages to actively correct geometric errors
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
- [1]
-
[2]
Bai,J., Xia, M., Fu, X.,Wang, X.,Mu, L., Cao, J.,Liu, Z., Hu, H., Bai, X., Wan, P., et al.: Recammaster: Camera-controlled generative rendering from a single video. arXiv preprint arXiv:2503.11647 (2025)
-
[3]
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
Blattmann, A., Dockhorn, T., Kulal, S., Mendelevitch, D., Kilian, M., Lorenz, D., Levi, Y., English, Z., Voleti, V., Letts, A., et al.: Stable video diffusion: Scaling latent video diffusion models to large datasets. arXiv preprint arXiv:2311.15127 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[4]
In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion
Cao, A., Johnson, J.: Hexplane: A fast representation for dynamic scenes. In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion. pp. 130–141 (2023)
work page 2023
-
[5]
In: Advances in Neural Information Processing Systems (NeurIPS) (2025)
Chen, K., Khurana, T., Ramanan, D.: Reconstruct, inpaint, finetune: Dynamic novel-view synthesis from monocular videos. In: Advances in Neural Information Processing Systems (NeurIPS) (2025)
work page 2025
-
[6]
IEEE Transactions on Visualization and Computer Graphics (2025)
Chung, J., Lee, S., Nam, H., Lee, J., Lee, K.M.: Luciddreamer: Domain-free gen- eration of 3d gaussian splatting scenes. IEEE Transactions on Visualization and Computer Graphics (2025)
work page 2025
-
[7]
In: ACM SIGGRAPH 2024 Conference Papers
Duan, Y., Wei, F., Dai, Q., He, Y., Chen, W., Chen, B.: 4d-rotor gaussian splatting: towards efficient novel view synthesis for dynamic scenes. In: ACM SIGGRAPH 2024 Conference Papers. pp. 1–11 (2024)
work page 2024
-
[8]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Fan, C.D., Chang, C.W., Liu, Y.R., Lee, J.Y., Huang, J.L., Tseng, Y.C., Liu, Y.L.: Spectromotion: Dynamic 3d reconstruction of specular scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 21328–21338 (June 2025)
work page 2025
-
[9]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Fridovich-Keil, S., Meanti, G., Warburg, F.R., Recht, B., Kanazawa, A.: K-planes: Explicit radiance fields in space, time, and appearance. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12479– 12488 (2023)
work page 2023
-
[10]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Gao, C., Saraf, A., Kopf, J., Huang, J.B.: Dynamic view synthesis from dynamic monocular video. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 5712–5721 (2021)
work page 2021
-
[11]
Gao, H., Li, R., Tulsiani, S., Russell, B., Kanazawa, A.: Dynamic novel-view syn- thesis: A reality check. In: NeurIPS (2022)
work page 2022
-
[12]
In: Proceedings of the AAAI Conference on Artificial Intelligence
Ham, S., Woo, S., Kim, J.Y., Go, H., Park, B., Kim, C.: Diffusion model patch- ing via mixture-of-prompts. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 39, pp. 17023–17031 (2025)
work page 2025
-
[13]
He, H., Xu, Y., Guo, Y., Wetzstein, G., Dai, B., Li, H., Yang, C.: Cameractrl: En- ablingcameracontrolfortext-to-videogeneration.arXivpreprintarXiv:2404.02101 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[14]
arXiv preprint arXiv:2508.10934 (2025)
Huang, J., Zhou, Q., Rabeti, H., Korovko, A., Ling, H., Ren, X., Shen, T., Gao, J., Slepichev,D.,Lin,C.H.,etal.:Vipe:Videoposeenginefor3dgeometricperception. arXiv preprint arXiv:2508.10934 (2025)
-
[15]
Huang, Z., He, Y., Yu, J., Zhang, F., Si, C., Jiang, Y., Zhang, Y., Wu, T., Jin, Q., Chanpaisit, N., Wang, Y., Chen, X., Wang, L., Lin, D., Qiao, Y., Liu, Z.: VBench: Comprehensive benchmark suite for video generative models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024) 16 Liu et al
work page 2024
-
[16]
arXiv preprint arXiv:2503.09151 (2025)
Jeong, H., Lee, S., Ye, J.C.: Reangle-a-video: 4d video generation as video-to-video translation. arXiv preprint arXiv:2503.09151 (2025)
- [17]
-
[18]
HunyuanVideo: A Systematic Framework For Large Video Generative Models
Kong, W., Tian, Q., Zhang, Z., Min, R., Dai, Z., Zhou, J., Xiong, J., Li, X., Wu, B., Zhang, J., et al.: Hunyuanvideo: A systematic framework for large video generative models. arXiv preprint arXiv:2412.03603 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[19]
arXiv preprint arXiv:2509.21119 (2025)
Lei, G., Wang, C., Wang, Y., Li, H., Song, Y., Xu, W.: Motionflow: Learning implicit motion flow for complex camera trajectory control in video generation. arXiv preprint arXiv:2509.21119 (2025)
-
[20]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Li, T., Slavcheva, M., Zollhoefer, M., Green, S., Lassner, C., Kim, C., Schmidt, T., Lovegrove, S., Goesele, M., Newcombe, R., et al.: Neural 3d video synthesis from multi-view video. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 5521–5531 (2022)
work page 2022
-
[21]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Li, Z., Chen, Z., Li, Z., Xu, Y.: Spacetime gaussian feature splatting for real- time dynamic view synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 8508–8520 (2024)
work page 2024
-
[22]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Li, Z., Niklaus, S., Snavely, N., Wang, O.: Neural scene flow fields for space-time view synthesis of dynamic scenes. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 6498–6508 (2021)
work page 2021
-
[23]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Lin, Y., Dai, Z., Zhu, S., Yao, Y.: Gaussian-flow: 4d reconstruction with dynamic 3d gaussian particle. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 21136–21145 (2024)
work page 2024
-
[24]
LIU, Q., Liu, Y., Wang, J., Lyu, X., Wang, P., Wang, W., Hou, J.: MoDGS: Dynamic gaussian splatting from casually-captured monocular videos with depth priors. In: The Thirteenth International Conference on Learning Representations (2025),https://openreview.net/forum?id=2prShxdLkX
work page 2025
-
[25]
Advances in neural information processing systems (2025)
Luo, Z., Ran, H., Lu, L.: Instant4d: 4d gaussian splatting in minutes. Advances in neural information processing systems (2025)
work page 2025
-
[26]
In: Eu- ropean Conference on Computer Vision
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: Representing scenes as neural radiance fields for view synthesis. In: Eu- ropean Conference on Computer Vision. pp. 405–421. Springer (2020)
work page 2020
-
[27]
In: Proceedings of the AAAI conference on artificial intelligence
Mou, C., Wang, X., Xie, L., Wu, Y., Zhang, J., Qi, Z., Shan, Y.: T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. In: Proceedings of the AAAI conference on artificial intelligence. vol. 38, pp. 4296–4304 (2024)
work page 2024
-
[28]
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
Nan, K., Xie, R., Zhou, P., Fan, T., Yang, Z., Chen, Z., Li, X., Yang, J., Tai, Y.: Openvid-1m: A large-scale high-quality dataset for text-to-video generation. arXiv preprint arXiv:2407.02371 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
- [29]
-
[30]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Ren, X., Shen, T., Huang, J., Ling, H., Lu, Y., Nimier-David, M., Müller, T., Keller, A., Fidler, S., Gao, J.: Gen3c: 3d-informed world-consistent video gener- ation with precise camera control. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 6121–6132 (2025)
work page 2025
-
[31]
In: 2025 International Conference on 3D Vision (3DV)
Shriram, J., Trevithick, A., Liu, L., Ramamoorthi, R.: Realmdreamer: Text-driven 3d scene generation with inpainting and depth diffusion. In: 2025 International Conference on 3D Vision (3DV). pp. 882–892. IEEE (2025) MoCam 17
work page 2025
-
[32]
In: Proceedings of the IEEE/CVF International Confer- ence on Computer Vision (ICCV)
Song,R.,Liang,C.,Xia,Y.,Zimmer,W.,Cao,H.,Caesar,H.,Festag,A.,Knoll,A.: Coda-4dgs: Dynamic gaussian splatting with context and deformation awareness for autonomous driving. In: Proceedings of the IEEE/CVF International Confer- ence on Computer Vision (ICCV). pp. 28031–28041 (October 2025)
work page 2025
-
[33]
In: European Conference on Computer Vision
Van Hoorick, B., Wu, R., Ozguroglu, E., Sargent, K., Liu, R., Tokmakov, P., Dave, A., Zheng, C., Vondrick, C.: Generative camera dolly: Extreme monocular dynamic novel view synthesis. In: European Conference on Computer Vision. pp. 313–331. Springer (2024)
work page 2024
-
[34]
Wan: Open and Advanced Large-Scale Video Generative Models
Wan, T., Wang, A., Ai, B., Wen, B., Mao, C., Xie, C.W., Chen, D., Yu, F., Zhao, H., Yang, J., Zeng, J., Wang, J., Zhang, J., Zhou, J., Wang, J., Chen, J., Zhu, K., Zhao, K., Yan, K., Huang, L., Feng, M., Zhang, N., Li, P., Wu, P., Chu, R., Feng, R., Zhang, S., Sun, S., Fang, T., Wang, T., Gui, T., Weng, T., Shen, T., Lin, W., Wang, W., Wang, W., Zhou, W.,...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[35]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Wang, H., Liu, Y., Liu, Z., Wang, W., Dong, Z., Yang, B.: Vistadream: Sampling multiview consistent images for single-view scene reconstruction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 26772–26782 (2025)
work page 2025
-
[36]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Wu, G., Yi, T., Fang, J., Xie, L., Zhang, X., Wei, W., Liu, W., Tian, Q., Wang, X.: 4d gaussian splatting for real-time dynamic scene rendering. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 20310– 20320 (2024)
work page 2024
-
[37]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Wu, P., Zhu, K., Liu, Y., Zhao, L., Zhai, W., Cao, Y., Zha, Z.J.: Improved video vae for latent video diffusion model. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 18124–18133 (2025)
work page 2025
-
[38]
In: Proceed- ings of the Computer Vision and Pattern Recognition Conference
Wu, R., Gao, R., Poole, B., Trevithick, A., Zheng, C., Barron, J.T., Holynski, A.: Cat4d: Create anything in 4d with multi-view video diffusion models. In: Proceed- ings of the Computer Vision and Pattern Recognition Conference. pp. 26057–26068 (2025)
work page 2025
-
[39]
Yang, Z., Yang, H., Pan, Z., Zhang, L.: Real-time photorealistic dynamic scene representation and rendering with 4d gaussian splatting. arXiv preprint arXiv:2310.10642 (2023)
-
[40]
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Yang, Z., Teng, J., Zheng, W., Ding, M., Huang, S., Xu, J., Yang, Y., Hong, W., Zhang, X., Feng, G., et al.: Cogvideox: Text-to-video diffusion models with an expert transformer. arXiv preprint arXiv:2408.06072 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[41]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Yang, Z., Gao, X., Zhou, W., Jiao, S., Zhang, Y., Jin, X.: Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 20331– 20341 (2024)
work page 2024
-
[42]
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models
Ye, H., Zhang, J., Liu, S., Han, X., Yang, W.: Ip-adapter: Text compati- ble image prompt adapter for text-to-image diffusion models. arXiv preprint arXiv:2308.06721 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[43]
You, M., Zhu, Z., Liu, H., Hou, J.: Nvs-solver: Video diffusion model as zero-shot novel view synthesizer. arXiv preprint arXiv:2405.15364 (2024)
-
[44]
arXiv preprint arXiv:2503.05638 (2025) 18 Liu et al
YU, M., Hu, W., Xing, J., Shan, Y.: Trajectorycrafter: Redirecting camera trajec- tory for monocular videos via diffusion models. arXiv preprint arXiv:2503.05638 (2025) 18 Liu et al
-
[45]
IEEE Transactions on Pattern Analysis and Machine Intelligence (2025)
Yu, W., Xing, J., Yuan, L., Hu, W., Li, X., Huang, Z., Gao, X., Wong, T.T., Shan, Y., Tian, Y.: Viewcrafter: Taming video diffusion models for high-fidelity novel view synthesis. IEEE Transactions on Pattern Analysis and Machine Intelligence (2025)
work page 2025
-
[46]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Zhang, D.J., Paiss, R., Zada, S., Karnad, N., Jacobs, D.E., Pritch, Y., Mosseri, I., Shou, M.Z., Wadhwa, N., Ruiz, N.: Recapture: Generative video camera controls for user-provided videos using masked video fine-tuning. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 2050–2062 (2025)
work page 2050
-
[47]
IEEE Transactions on Visualization and Computer Graphics30(12), 7749–7762 (2024)
Zhang, J., Li, X., Wan, Z., Wang, C., Liao, J.: Text2nerf: Text-driven 3d scene generation with neural radiance fields. IEEE Transactions on Visualization and Computer Graphics30(12), 7749–7762 (2024)
work page 2024
-
[48]
Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models (2023)
work page 2023
-
[49]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Zhang, S., Xu, H., Guo, S., Xie, Z., Bao, H., Xu, W., Zou, C.: Spatialcrafter: Unleashing the imagination of video diffusion models for scene reconstruction from limited observations. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 27794–27805 (2025)
work page 2025
-
[50]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)
Zhang, X., Liu, Z., Zhang, Y., Ge, X., He, D., Xu, T., Wang, Y., Lin, Z., Yan, S., Zhang, J.: Mega: Memory-efficient 4d gaussian splatting for dynamic scenes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 27828–27838 (October 2025)
work page 2025
-
[51]
Zhu, J., Tang, H.: Dynamic scene reconstruction: Recent advance in real-time ren- dering and streaming. arXiv preprint arXiv:2503.08166 (2025)
-
[52]
Zhuang, S., Guo, Y., Ding, Y., Li, K., Chen, X., Wang, Y., Wang, F., Zhang, Y., Li, C., Wang, Y.: Timestep master: Asymmetrical mixture of timestep lora experts for versatile and efficient diffusion models in vision. arXiv preprint arXiv:2503.07416 (2025) MoCam 19 A Comparison with 3D-based method We further compare our method with ViewCrafter [45] on sin...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.