Recognition: 2 theorem links
· Lean TheoremMVDream: Multi-view Diffusion for 3D Generation
Pith reviewed 2026-05-15 08:31 UTC · model grok-4.3
The pith
A multi-view diffusion model trained on both 2D and 3D data acts as a generalizable 3D prior that improves consistency in text-to-3D generation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MVDream shows that a multi-view diffusion model learned from both 2D and 3D data is implicitly a generalizable 3D prior agnostic to 3D representations. Applied via Score Distillation Sampling, it markedly improves the consistency and stability of existing 2D-lifting approaches to 3D generation. It further enables few-shot concept learning from 2D examples for 3D output, analogous to DreamBooth but in the 3D setting.
What carries the argument
The multi-view diffusion model trained jointly on 2D and 3D data, which generates viewpoint-consistent images and thereby encodes an implicit 3D prior usable in score distillation sampling.
If this is right
- Existing 2D-lifting pipelines for text-to-3D can be upgraded to higher consistency simply by swapping in the multi-view diffusion prior.
- Few-shot personalization of 3D objects becomes feasible from ordinary 2D photographs without explicit 3D data.
- The same prior can be used with any 3D representation because it does not depend on a specific geometry format.
- Training cost for new 3D generators decreases because the model already supplies multi-view consistency.
Where Pith is reading between the lines
- The approach could be extended to video or dynamic scenes by adding temporal consistency as another training signal.
- If the implicit prior holds across domains, similar joint 2D-3D training might improve consistency in other generative tasks such as novel-view synthesis.
- Downstream applications could combine this prior with faster inference methods to make real-time 3D content creation more practical.
Load-bearing premise
Joint training on 2D and 3D data yields a prior that generalizes to new text prompts and shapes without overfitting to the specific training renderings or degrading single-view image quality.
What would settle it
A direct comparison showing that score distillation sampling with MVDream produces no measurable gain in multi-view consistency or output stability over standard 2D diffusion baselines on a fixed set of text prompts would falsify the central claim.
read the original abstract
We introduce MVDream, a diffusion model that is able to generate consistent multi-view images from a given text prompt. Learning from both 2D and 3D data, a multi-view diffusion model can achieve the generalizability of 2D diffusion models and the consistency of 3D renderings. We demonstrate that such a multi-view diffusion model is implicitly a generalizable 3D prior agnostic to 3D representations. It can be applied to 3D generation via Score Distillation Sampling, significantly enhancing the consistency and stability of existing 2D-lifting methods. It can also learn new concepts from a few 2D examples, akin to DreamBooth, but for 3D generation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces MVDream, a multi-view diffusion model trained jointly on 2D and 3D data to generate consistent multi-view images from text prompts. It claims this model functions as an implicit generalizable 3D prior agnostic to representations, which can be applied via Score Distillation Sampling (SDS) to enhance consistency and stability in existing 2D-lifting 3D generation methods, and extended to few-shot 3D concept learning akin to DreamBooth.
Significance. If validated, the work offers a practical bridge between the generalizability of 2D diffusion models and the multi-view consistency of 3D renderings, potentially improving text-to-3D pipelines without explicit 3D representations. The empirical demonstrations of SDS-based generation and few-shot adaptation provide concrete value for 3D content creation applications, though the strength depends on rigorous quantitative support for the transfer claims.
major comments (2)
- [§4.1] §4.1 (3D Generation via SDS): The central claim that MVDream 'significantly enhancing the consistency and stability of existing 2D-lifting methods' lacks load-bearing quantitative evidence; no metrics (e.g., multi-view consistency scores, CLIP similarity across views, or direct comparisons to DreamFusion baselines) or ablation isolating the multi-view prior's contribution are reported, leaving the enhancement unverified.
- [§5] §5 (Few-shot 3D Concept Learning): The generalizability of the 3D prior to novel text prompts and shapes rests on the untested assumption that joint 2D+3D training avoids overfitting to the specific 3D training renderings; no ablation on 3D data contribution, no diversity statistics for the 3D corpus, and no out-of-distribution shape/prompt tests are provided to support the transfer claim.
minor comments (2)
- [§3.2] §3.2 (Model Architecture): The definition and range of the 'multi-view conditioning strength' hyperparameter is introduced without explicit notation or sensitivity analysis, complicating reproducibility of the reported results.
- [Figure 3] Figure 3 (Qualitative Results): The caption does not specify the exact text prompts or camera poses used for the multi-view generations, reducing clarity for readers attempting to interpret the consistency improvements.
Simulated Author's Rebuttal
Thank you for the constructive feedback. We address the major comments point-by-point below and will revise the manuscript accordingly to strengthen the quantitative support for our claims.
read point-by-point responses
-
Referee: [§4.1] §4.1 (3D Generation via SDS): The central claim that MVDream 'significantly enhancing the consistency and stability of existing 2D-lifting methods' lacks load-bearing quantitative evidence; no metrics (e.g., multi-view consistency scores, CLIP similarity across views, or direct comparisons to DreamFusion baselines) or ablation isolating the multi-view prior's contribution are reported, leaving the enhancement unverified.
Authors: We agree that the current manuscript relies primarily on qualitative results for this claim. In the revision we will add quantitative metrics including multi-view consistency scores and average CLIP similarity across generated views, plus direct numerical comparisons against DreamFusion baselines. We will also include an ablation that isolates the multi-view prior's contribution by comparing against a 2D-only diffusion baseline under identical SDS settings. revision: yes
-
Referee: [§5] §5 (Few-shot 3D Concept Learning): The generalizability of the 3D prior to novel text prompts and shapes rests on the untested assumption that joint 2D+3D training avoids overfitting to the specific 3D training renderings; no ablation on 3D data contribution, no diversity statistics for the 3D corpus, and no out-of-distribution shape/prompt tests are provided to support the transfer claim.
Authors: We acknowledge that additional controls are needed to substantiate the transfer claim. The revised version will report (1) an ablation measuring performance with and without the 3D training data, (2) basic diversity statistics (e.g., object category coverage and viewpoint distribution) for the 3D corpus, and (3) qualitative and quantitative results on out-of-distribution shapes and prompts not seen during training. revision: yes
Circularity Check
No significant circularity; core claim is empirical training result
full rationale
The paper claims that joint training on 2D and 3D data yields a multi-view diffusion model that acts as a generalizable 3D prior, demonstrated via application to SDS for 3D generation. This rests on external datasets and standard diffusion training rather than any self-definitional loop, fitted parameter renamed as prediction, or load-bearing self-citation chain. No equations or derivations reduce the claimed prior to its inputs by construction. Minor self-citations (e.g., to DreamBooth or SDS) are not central to the derivation and do not force the result. The derivation chain is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- multi-view conditioning strength
axioms (1)
- domain assumption Joint 2D-3D training yields a prior that is agnostic to explicit 3D representations.
Lean theorems connected to this paper
-
IndisputableMonolith.Foundation.DAlembert.Inevitabilitybilinear_family_forced unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We demonstrate that such a multi-view diffusion model is implicitly a generalizable 3D prior agnostic to 3D representations. It can be applied to 3D generation via Score Distillation Sampling, significantly enhancing the consistency and stability of existing 2D-lifting methods.
-
IndisputableMonolith.Foundation.HierarchyEmergencehierarchy_emergence_forces_phi unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Learning from both 2D and 3D data, a multi-view diffusion model can achieve the generalizability of 2D diffusion models and the consistency of 3D renderings.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 20 Pith papers
-
Mind the Gap: Geometrically Accurate Generative Reconstruction from Disjoint Views
GLADOS reconstructs 3D geometry from disjoint views by generating intermediate perspectives, performing robust coarse alignment that tolerates generative inconsistencies, and iteratively expanding context for consistency.
-
R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow
R-DMesh generates high-fidelity 4D meshes aligned to video by disentangling base mesh, motion, and a learned rectification jump offset inside a VAE, then using Triflow Attention and rectified-flow diffusion.
-
Geometrically Consistent Multi-View Scene Generation from Freehand Sketches
A framework generates consistent multi-view scenes from one freehand sketch via a ~9k-sample dataset, Parallel Camera-Aware Attention Adapters, and Sparse Correspondence Supervision Loss, outperforming baselines in re...
-
SparseCam4D: Spatio-Temporally Consistent 4D Reconstruction from Sparse Cameras
SparseCam4D achieves spatio-temporally consistent high-fidelity 4D reconstruction from sparse cameras via a Spatio-Temporal Distortion Field that corrects inconsistencies in generative observations.
-
GeoQuery: Geometry-Query Diffusion for Sparse-View Reconstruction
GeoQuery replaces corrupted rendering features with geometry-aligned proxy queries and restricts cross-view attention to local windows, enabling robust diffusion-based refinement under extreme view sparsity.
-
Beyond Thinking: Imagining in 360$^\circ$ for Humanoid Visual Search
Imagining in 360° decouples visual search into a single-step probabilistic semantic layout predictor and an actor, removing the need for multi-turn CoT reasoning and trajectory annotations while improving efficiency i...
-
Velox: Learning Representations of 4D Geometry and Appearance
Velox compresses dynamic point clouds into latent tokens that support geometry via 4D surface modeling and appearance via 3D Gaussians, showing strong results on video-to-4D generation, tracking, and image-to-4D cloth...
-
Structured 3D Latents Are Surprisingly Powerful: Unleashing Generalizable Style with 2D Diffusion
DiLAST optimizes 3D latents via guidance from a 2D diffusion model to enable generalizable style transfer for OOD styles in 3D asset generation.
-
REVIVE 3D: Refinement via Encoded Voluminous Inflated prior for Volume Enhancement
REVIVE 3D generates voluminous 3D assets from flat 2D images via an inflated prior construction followed by latent-space refinement, plus new metrics for volume and flatness validated by user study.
-
Camera Control for Text-to-Image Generation via Learning Viewpoint Tokens
Viewpoint tokens learned on a mixed 3D-rendered and photorealistic dataset enable precise camera control in text-to-image generation while factorizing geometry from appearance and transferring to unseen object categories.
-
AnyLift: Scaling Motion Reconstruction from Internet Videos via 2D Diffusion
A two-stage method synthesizes multi-view 2D motion data from internet video keypoints and trains a camera-conditioned diffusion model to recover globally consistent 3D human motion and HOI in world space.
-
HandDreamer: Zero-Shot Text to 3D Hand Model Generation using Corrective Hand Shape Guidance
HandDreamer is the first zero-shot text-to-3D method for hands that uses MANO initialization, skeleton-guided diffusion, and corrective shape guidance to produce view-consistent models.
-
Stepper: Stepwise Immersive Scene Generation with Multiview Panoramas
Stepper uses stepwise panoramic expansion with a multi-view 360-degree diffusion model and geometry reconstruction to produce high-fidelity, structurally consistent immersive 3D scenes from text.
-
InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models
InstantMesh produces diverse, high-quality 3D meshes from single images in seconds by combining a multi-view diffusion model with a sparse-view large reconstruction model and optimizing directly on meshes.
-
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
Stable Video Diffusion scales latent video diffusion models via text-to-image pretraining, video pretraining on curated data, and high-quality finetuning to produce competitive text-to-video and image-to-video results...
-
R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow
R-DMesh uses a VAE with a learned rectification jump offset and Triflow Attention inside a rectified-flow diffusion transformer to produce video-aligned 4D meshes despite initial pose misalignment.
-
Pose-Aware Diffusion for 3D Generation
PAD synthesizes 3D geometry in observation space via depth unprojection as anchor to eliminate pose ambiguity in image-to-3D generation.
-
Asset Harvester: Extracting 3D Assets from Autonomous Driving Logs for Simulation
Asset Harvester converts sparse in-the-wild object observations from AV driving logs into complete simulation-ready 3D assets via data curation, geometry-aware preprocessing, and a SparseViewDiT model that couples spa...
-
AnimateAnyMesh++: A Flexible 4D Foundation Model for High-Fidelity Text-Driven Mesh Animation
AnimateAnyMesh++ animates arbitrary 3D meshes from text using an expanded 300K-identity DyMesh-XL dataset, a power-law topology-aware DyMeshVAE-Flex, and a variable-length rectified-flow generator to produce semantica...
-
Cosmos World Foundation Model Platform for Physical AI
The Cosmos platform supplies open-source pre-trained world models and supporting tools for building fine-tunable digital world simulations to train Physical AI.
Reference graph
Works this paper leans on
-
[1]
https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0
stable-diffusion-xl-base-1.0. https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0. Accessed: 2023-08-29
work page 2023
-
[2]
https://sketchfab.com/3d-models/popular
Sketchfab. https://sketchfab.com/3d-models/popular. Accessed: 2023-08-30
work page 2023
-
[3]
https://huggingface.co/DeepFloyd
Deepfloyd. https://huggingface.co/DeepFloyd. Accessed: 2023-08-25
work page 2023
-
[4]
https://lumalabs.ai/dashboard/imagine
Luma.ai. https://lumalabs.ai/dashboard/imagine. Accessed: 2023-08-25
work page 2023
-
[5]
https://huggingface.co/spaces/lambdalabs/stable-diffusion-image-variations
Stable diffusion image variation. https://huggingface.co/spaces/lambdalabs/stable-diffusion-image-variations
-
[6]
https://huggingface.co/stabilityai/stable-diffusion-2-1-base
Stable diffusion 2.1 base. https://huggingface.co/stabilityai/stable-diffusion-2-1-base. Accessed: 2023-07-14
work page 2023
-
[7]
https://github.com/threestudio-project/threestudio
Threestudio project. https://github.com/threestudio-project/threestudio. Accessed: 2023-08-25
work page 2023
-
[9]
Barron, Ben Mildenhall, Dor Verbin, Pratul P
Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, and Peter Hedman. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. CVPR, 2022
work page 2022
-
[10]
Align your latents: High-resolution video synthesis with latent diffusion models
Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, and Karsten Kreis. Align your latents: High-resolution video synthesis with latent diffusion models. In CVPR, 2023
work page 2023
-
[11]
Efficient geometry-aware 3d generative adversarial networks
Eric R Chan, Connor Z Lin, Matthew A Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas J Guibas, Jonathan Tremblay, Sameh Khamis, et al. Efficient geometry-aware 3d generative adversarial networks. In CVPR, 2022
work page 2022
-
[12]
Eric R. Chan, Koki Nagano, Matthew A. Chan, Alexander W. Bergman, Jeong Joon Park, Axel Levy, Miika Aittala, Shalini De Mello, Tero Karras, and Gordon Wetzstein. GeNVS : Generative novel view synthesis with 3D -aware diffusion models. In arXiv, 2023
work page 2023
-
[14]
Objaverse: A universe of annotated 3d objects
Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. Objaverse: A universe of annotated 3d objects. In CVPR, pp.\ 13142--13153, 2023
work page 2023
-
[15]
Gram: Generative radiance manifolds for 3d-aware image generation
Yu Deng, Jiaolong Yang, Jianfeng Xiang, and Xin Tong. Gram: Generative radiance manifolds for 3d-aware image generation. In CVPR, pp.\ 10673--10683, 2022
work page 2022
-
[16]
Get3d: A generative model of high quality 3d textured shapes learned from images
Jun Gao, Tianchang Shen, Zian Wang, Wenzheng Chen, Kangxue Yin, Daiqing Li, Or Litany, Zan Gojcic, and Sanja Fidler. Get3d: A generative model of high quality 3d textured shapes learned from images. NeurIPS, 2022
work page 2022
-
[17]
Learning single-image 3d reconstruction by generative modelling of shape, pose and shading
Paul Henderson and Vittorio Ferrari. Learning single-image 3d reconstruction by generative modelling of shape, pose and shading. International Journal of Computer Vision, 2020
work page 2020
-
[18]
Leveraging 2d data to learn textured 3d mesh generation
Paul Henderson, Vagia Tsiminaki, and Christoph H Lampert. Leveraging 2d data to learn textured 3d mesh generation. In CVPR, 2020
work page 2020
-
[19]
Gans trained by a two time-scale update rule converge to a local nash equilibrium
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. NeurIPS, 2017
work page 2017
-
[23]
Holodiffusion: Training a 3d diffusion model using 2d images
Animesh Karnewar, Andrea Vedaldi, David Novotny, and Niloy J Mitra. Holodiffusion: Training a 3d diffusion model using 2d images. In CVPR, 2023
work page 2023
-
[24]
Adam: A method for stochastic optimization
Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In ICLR, 2014
work page 2014
-
[25]
Auto-encoding variational bayes
Diederik P Kingma and Max Welling. Auto-encoding variational bayes. In ICLR, 2014
work page 2014
-
[26]
Magic3d: High-resolution text-to-3d content creation
Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. Magic3d: High-resolution text-to-3d content creation. In CVPR, 2023 a
work page 2023
-
[29]
Nerf: Representing scenes as neural radiance fields for view synthesis
Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2021
work page 2021
-
[30]
Instant neural graphics primitives with a multiresolution hash encoding
Thomas M\"uller, Alex Evans, Christoph Schied, and Alexander Keller. Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph., 2022
work page 2022
-
[31]
Hologan: Unsupervised learning of 3d representations from natural images
Thu Nguyen-Phuoc, Chuan Li, Lucas Theis, Christian Richardt, and Yong-Liang Yang. Hologan: Unsupervised learning of 3d representations from natural images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019
work page 2019
-
[32]
Blockgan: Learning 3d object-aware scene representations from unlabelled images
Thu H Nguyen-Phuoc, Christian Richardt, Long Mai, Yongliang Yang, and Niloy Mitra. Blockgan: Learning 3d object-aware scene representations from unlabelled images. NeurIPS, 2020
work page 2020
-
[34]
Giraffe: Representing scenes as compositional generative neural feature fields
Michael Niemeyer and Andreas Geiger. Giraffe: Representing scenes as compositional generative neural feature fields. In CVPR, 2021
work page 2021
-
[36]
Ben Poole, Ajay Jain, Jonathan T. Barron, and Ben Mildenhall. Dreamfusion: Text-to-3d using 2d diffusion. In ICLR, 2023
work page 2023
-
[37]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In ICML, 2021
work page 2021
-
[39]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj \" o rn Ommer. High-resolution image synthesis with latent diffusion models. In CVPR, 2022
work page 2022
-
[40]
Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation
Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In CVPR, 2023
work page 2023
-
[41]
Photorealistic text-to-image diffusion models with deep language understanding
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. Photorealistic text-to-image diffusion models with deep language understanding. NeurIPS, 2022
work page 2022
-
[42]
Improved techniques for training gans
Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. NeurIPS, 2016
work page 2016
-
[43]
Laion-5b: An open large-scale dataset for training next generation image-text models
Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, et al. Laion-5b: An open large-scale dataset for training next generation image-text models. NeurIPS, 2022
work page 2022
-
[44]
Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis
Tianchang Shen, Jun Gao, Kangxue Yin, Ming-Yu Liu, and Sanja Fidler. Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis. In NeurIPS, 2021
work page 2021
-
[45]
3d neural field generation using triplane diffusion
J Ryan Shue, Eric Ryan Chan, Ryan Po, Zachary Ankner, Jiajun Wu, and Gordon Wetzstein. 3d neural field generation using triplane diffusion. In CVPR, 2023
work page 2023
-
[47]
Scene representation networks: Continuous 3d-structure-aware neural scene representations
Vincent Sitzmann, Michael Zollh \"o fer, and Gordon Wetzstein. Scene representation networks: Continuous 3d-structure-aware neural scene representations. NeurIPS, 32, 2019
work page 2019
-
[49]
Viewset diffusion:(0-) image-conditioned 3d generative models from 2d data, 2023
Stanislaw Szymanowicz, Christian Rupprecht, and Andrea Vedaldi. Viewset diffusion:(0-) image-conditioned 3d generative models from 2d data, 2023
work page 2023
-
[53]
Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation
Haochen Wang, Xiaodan Du, Jiahao Li, Raymond A Yeh, and Greg Shakhnarovich. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In CVPR, 2023 a
work page 2023
-
[54]
Rodin: A generative model for sculpting 3d digital avatars using diffusion
Tengfei Wang, Bo Zhang, Ting Zhang, Shuyang Gu, Jianmin Bao, Tadas Baltrusaitis, Jingjing Shen, Dong Chen, Fang Wen, Qifeng Chen, et al. Rodin: A generative model for sculpting 3d digital avatars using diffusion. In CVPR, 2023 b
work page 2023
-
[56]
Novel view synthesis with diffusion models
Daniel Watson, William Chan, Ricardo Martin - Brualla, Jonathan Ho, Andrea Tagliasacchi, and Mohammad Norouzi. Novel view synthesis with diffusion models. In ICLR, 2023
work page 2023
-
[57]
Multiview compressive coding for 3d reconstruction
Chao-Yuan Wu, Justin Johnson, Jitendra Malik, Christoph Feichtenhofer, and Georgia Gkioxari. Multiview compressive coding for 3d reconstruction. In CVPR, 2023
work page 2023
-
[59]
Sparsefusion: Distilling view-conditioned diffusion for 3d reconstruction
Zhizhuo Zhou and Shubham Tulsiani. Sparsefusion: Distilling view-conditioned diffusion for 3d reconstruction. In CVPR, 2023
work page 2023
-
[60]
MetaHuman , howpublished =
-
[61]
A 3D Face Model for Pose and Illumination Invariant Face Recognition , author=. Proceedings of the 6th IEEE International Conference on Advanced Video and Signal based Surveillance (AVSS) for Security, Safety and Monitoring in Smart Environments , year=
-
[62]
International Journal of Computer Vision , year=
Learning single-image 3d reconstruction by generative modelling of shape, pose and shading , author=. International Journal of Computer Vision , year=
-
[63]
Leveraging 2d data to learn textured 3d mesh generation , author=. CVPR , year=
-
[64]
Efficient geometry-aware 3D generative adversarial networks , author=. CVPR , year=
-
[65]
Get3d: A generative model of high quality 3d textured shapes learned from images , author=. NeurIPS , year=
-
[66]
Rodin: A generative model for sculpting 3d digital avatars using diffusion , author=. CVPR , year=
-
[67]
Holodiffusion: Training a 3D diffusion model using 2D images , author=. CVPR , year=
- [68]
-
[69]
Poole, Ben and Jain, Ajay and Barron, Jonathan T. and Mildenhall, Ben , title =. ICLR , year=
- [70]
-
[71]
Nerf: Representing scenes as neural radiance fields for view synthesis , author=. ECCV , year=
- [72]
-
[73]
Proceedings of the IEEE/CVF International Conference on Computer Vision , year=
Hologan: Unsupervised learning of 3d representations from natural images , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , year=
-
[74]
Blockgan: Learning 3d object-aware scene representations from unlabelled images , author=. NeurIPS , year=
-
[75]
Giraffe: Representing scenes as compositional generative neural feature fields , author=. CVPR , year=
- [76]
-
[77]
Gram: Generative radiance manifolds for 3d-aware image generation , author=. CVPR , pages=
- [78]
-
[79]
Point-E: A System for Generating 3D Point Clouds from Complex Prompts
Point-e: A system for generating 3d point clouds from complex prompts , author=. arXiv:2212.08751 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[80]
Shap-e: Generating conditional 3d implicit functions
Shap-e: Generating conditional 3d implicit functions , author=. arXiv:2305.02463 , year=
-
[81]
Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation , author=. CVPR , year=
-
[82]
Stable Diffusion 2.1 base , howpublished =
-
[83]
DreamTime: An Improved Optimization Strategy for Text-to-3D Content Creation , author=. arXiv:2306.12422 , year=
-
[84]
ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation , author=. arXiv:2305.16213 , year=
-
[85]
Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation , author=. arXiv:2303.13873 , year=
-
[86]
TextMesh: Generation of Realistic 3D Meshes From Text Prompts , author=. arXiv:2304.12439 , year=
-
[87]
Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation , author=. CVPR , year=
-
[88]
Dreambooth3d: Subject-driven text-to-3d generation , author=. arXiv:2303.13508 , year=
-
[89]
Zero-1-to-3: Zero-shot one image to 3d object , author=. arXiv:2303.11328 , year=
-
[90]
Mehdi S. M. Sajjadi and Henning Meyer and Etienne Pot and Urs Bergmann and Klaus Greff and Noha Radwan and Suhani Vora and Mario Lucic and Daniel Duckworth and Alexey Dosovitskiy and Jakob Uszkoreit and Thomas A. Funkhouser and Andrea Tagliasacchi , title =. CVPR , year =
-
[91]
Denoising Diffusion Probabilistic Models , booktitle =
Jonathan Ho and Ajay Jain and Pieter Abbeel , editor =. Denoising Diffusion Probabilistic Models , booktitle =. 2020 , url =
work page 2020
-
[92]
Deep Unsupervised Learning using Nonequilibrium Thermodynamics , booktitle =
Jascha Sohl. Deep Unsupervised Learning using Nonequilibrium Thermodynamics , booktitle =. 2015 , url =
work page 2015
-
[93]
Generative Modeling by Estimating Gradients of the Data Distribution , booktitle =
Yang Song and Stefano Ermon , editor =. Generative Modeling by Estimating Gradients of the Data Distribution , booktitle =. 2019 , url =
work page 2019
-
[94]
Diffusion Models Beat GANs on Image Synthesis , booktitle =
Prafulla Dhariwal and Alexander Quinn Nichol , editor =. Diffusion Models Beat GANs on Image Synthesis , booktitle =. 2021 , url =
work page 2021
-
[95]
Fleet and Mohammad Norouzi and Tim Salimans , title =
Jonathan Ho and Chitwan Saharia and William Chan and David J. Fleet and Mohammad Norouzi and Tim Salimans , title =. J. Mach. Learn. Res. , volume =. 2022 , url =
work page 2022
-
[96]
RePaint: Inpainting using Denoising Diffusion Probabilistic Models , booktitle =
Andreas Lugmayr and Martin Danelljan and Andr. RePaint: Inpainting using Denoising Diffusion Probabilistic Models , booktitle =
-
[97]
Fleet and Mohammad Norouzi , title =
Chitwan Saharia and Jonathan Ho and William Chan and Tim Salimans and David J. Fleet and Mohammad Norouzi , title =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.