Recognition: 2 theorem links
Shap-E: Generating Conditional 3D Implicit Functions
Pith reviewed 2026-05-16 15:27 UTC · model grok-4.3
The pith
Shap-E generates parameters of implicit functions for 3D assets directly from text prompts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Shap-E trains an encoder that deterministically maps 3D assets into the parameters of an implicit function, then trains a conditional diffusion model on those encoded outputs so that text conditions can produce new implicit-function parameters that render as both textured meshes and neural radiance fields.
What carries the argument
A two-stage pipeline consisting of a deterministic encoder that converts 3D assets into implicit-function parameters, followed by a conditional diffusion model that generates new parameters in that space.
If this is right
- A single trained model produces both textured meshes and neural radiance fields from the same implicit-function parameters.
- Generation completes in seconds once the model is trained on paired 3D and text data.
- Sample quality reaches or exceeds that of explicit point-cloud generators while operating in a higher-dimensional multi-representation space.
- The diffusion process runs entirely in the compressed parameter space produced by the encoder.
Where Pith is reading between the lines
- The same parameter space might support conditioning on images or sketches in addition to text without retraining the encoder.
- Because the output is an implicit function, downstream tasks such as collision detection or ray-tracing can operate directly on the generated representation.
- Scaling the encoder to larger or more diverse 3D datasets could reduce the information loss that currently limits the diffusion stage.
Load-bearing premise
The encoder can map arbitrary 3D assets into implicit-function parameters with negligible information loss so the diffusion model sees a faithful latent space.
What would settle it
Observe whether text-conditioned outputs consistently produce 3D assets whose rendered views match the prompt description and exhibit visual diversity across different samplings.
read the original abstract
We present Shap-E, a conditional generative model for 3D assets. Unlike recent work on 3D generative models which produce a single output representation, Shap-E directly generates the parameters of implicit functions that can be rendered as both textured meshes and neural radiance fields. We train Shap-E in two stages: first, we train an encoder that deterministically maps 3D assets into the parameters of an implicit function; second, we train a conditional diffusion model on outputs of the encoder. When trained on a large dataset of paired 3D and text data, our resulting models are capable of generating complex and diverse 3D assets in a matter of seconds. When compared to Point-E, an explicit generative model over point clouds, Shap-E converges faster and reaches comparable or better sample quality despite modeling a higher-dimensional, multi-representation output space. We release model weights, inference code, and samples at https://github.com/openai/shap-e.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents Shap-E, a conditional generative model for 3D assets that directly outputs parameters of implicit functions renderable as both textured meshes and neural radiance fields. Training proceeds in two stages: an encoder deterministically maps 3D assets to implicit-function parameters, after which a conditional diffusion model is trained on the encoder outputs using paired 3D-text data. The resulting models generate complex, diverse 3D assets in seconds and are reported to converge faster while achieving comparable or better sample quality than the Point-E baseline despite operating in a higher-dimensional, multi-representation output space. Model weights, inference code, and samples are released.
Significance. If the empirical claims hold, the work provides a practical advance in 3D generative modeling by producing implicit representations that support multiple downstream renderings. The two-stage architecture with publicly released implementation enables direct empirical verification of reconstruction fidelity and generation quality, strengthening reproducibility.
minor comments (3)
- [Abstract] Abstract: the phrase 'a large dataset of paired 3D and text data' would be strengthened by reporting the approximate number of assets or total training tokens to allow readers to gauge scale.
- [Results] Results section: the comparison tables would benefit from an additional column or row reporting reconstruction PSNR or IoU for the encoder stage alone, to quantify information loss before the diffusion stage.
- [Method] The implicit-function parameterization (Eq. 1 or equivalent) could include an explicit statement of the number of output channels or basis functions used for texture, to clarify the dimensionality increase relative to Point-E.
Simulated Author's Rebuttal
We thank the referee for their accurate summary of Shap-E, for highlighting the significance of the two-stage training procedure and multi-representation output, and for recommending acceptance. We are pleased that the practical advantages and reproducibility aspects were noted.
Circularity Check
No significant circularity
full rationale
The paper presents an empirical two-stage pipeline: an encoder that maps 3D assets to implicit-function parameters, followed by a conditional diffusion model trained on those encoder outputs using external paired 3D-text data. Performance claims rest on direct training and comparison against the independently published Point-E baseline rather than any internal equation that reduces a reported metric to a quantity defined by the authors' own fitted constants or self-citation chain. No load-bearing step matches the enumerated circularity patterns; the derivation is self-contained against external benchmarks and released code.
Axiom & Free-Parameter Ledger
free parameters (1)
- diffusion model training hyperparameters
axioms (1)
- domain assumption An encoder can map 3D assets into implicit-function parameters with negligible loss of geometric and textural information
Forward citations
Cited by 20 Pith papers
-
ReConText3D: Replay-based Continual Text-to-3D Generation
ReConText3D is the first replay-memory framework for continual text-to-3D generation that prevents catastrophic forgetting on new textual categories while preserving quality on previously seen classes.
-
Co-generation of Layout and Shape from Text via Autoregressive 3D Diffusion
3D-ARD+ unifies autoregressive token prediction with diffusion-based 3D latent generation to co-produce indoor scene layouts and object geometries that follow complex text-specified spatial and semantic constraints.
-
VecSet-Edit: Unleashing Pre-trained LRM for Mesh Editing from Single Image
VecSet-Edit is the first method to perform high-fidelity mesh editing from a single image by analyzing and manipulating spatial token subsets in a pre-trained VecSet LRM.
-
Affostruction: 3D Affordance Grounding with Generative Reconstruction
Affostruction reconstructs full 3D object geometry from partial RGBD views and grounds text-based affordances on both visible and unobserved surfaces, reporting large gains over prior methods.
-
Structured 3D Latents for Scalable and Versatile 3D Generation
SLAT provides a unified 3D latent representation enabling versatile high-quality generation across multiple output formats from text or image inputs.
-
LRM: Large Reconstruction Model for Single Image to 3D
LRM is a large transformer that predicts a NeRF directly from a single image after training on a million-object multi-view dataset.
-
DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation
DreamGaussian creates high-quality textured 3D meshes from single-view images in 2 minutes via generative Gaussian Splatting with mesh extraction and UV refinement.
-
REVIVE 3D: Refinement via Encoded Voluminous Inflated prior for Volume Enhancement
REVIVE 3D generates voluminous 3D assets from flat 2D images via an inflated prior construction followed by latent-space refinement, plus new metrics for volume and flatness validated by user study.
-
Point-MF: One-step Point Cloud Generation from a Single Image via Mean Flows
Point-MF performs one-step point cloud reconstruction from single images by learning a mean velocity field in point space with a tailored Diffusion Transformer and a new auxiliary loss.
-
Sculpt4D: Generating 4D Shapes via Sparse-Attention Diffusion Transformers
Sculpt4D generates temporally coherent 4D shapes by integrating a block sparse attention mechanism with time-decaying mask into a pretrained 3D diffusion transformer, achieving SOTA results with 56% less computation.
-
SIC3D: Style Image Conditioned Text-to-3D Gaussian Splatting Generation
SIC3D generates text-to-3D objects with Gaussian splatting then stylizes them using Variational Stylized Score Distillation loss plus scaling regularization to improve style match and geometry fidelity.
-
TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models
TripoSG generates high-fidelity 3D meshes from input images via a large-scale rectified flow transformer and hybrid-trained 3D VAE on a custom 2-million-sample dataset, claiming state-of-the-art fidelity and generalization.
-
InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models
InstantMesh produces diverse, high-quality 3D meshes from single images in seconds by combining a multi-view diffusion model with a sparse-view large reconstruction model and optimizing directly on meshes.
-
SyncDreamer: Generating Multiview-consistent Images from a Single-view Image
SyncDreamer produces multiview-consistent images from a single input image by jointly modeling their distribution and synchronizing intermediate diffusion states via 3D-aware attention.
-
MVDream: Multi-view Diffusion for 3D Generation
MVDream is a multi-view diffusion model that functions as a generalizable 3D prior, enabling more consistent text-to-3D generation and few-shot 3D concept learning from 2D examples.
-
SpatialPrompt: XR-Based Spatial Intent Expression as Executable Constraints for AI Generative 3D Design
SpatialPrompt turns spatial sketches and voice prompts into executable constraints for controllable AI 3D generation in XR, enabling iterative collaborative creation with color-coded contributions.
-
From Visual Synthesis to Interactive Worlds: Toward Production-Ready 3D Asset Generation
The paper surveys 3D asset generation methods and organizes them around the full production pipeline to assess which outputs meet engine-level requirements for interactive applications.
-
Unposed-to-3D: Learning Simulation-Ready Vehicles from Real-World Images
Unposed-to-3D learns simulation-ready 3D vehicle models from unposed real images by predicting camera parameters for photometric self-supervision, then adding scale prediction and harmonization.
-
MOC-3D: Manifold-Order Consistency for Text-to-3D Generation
MOC-3D adds a semantic view-order constraint using CLIP monotonicity and a manifold-based feature continuity module on SPD Riemannian space to reduce macro-topological and micro-geometric inconsistencies in SDS-based ...
-
From Visual Synthesis to Interactive Worlds: Toward Production-Ready 3D Asset Generation
The paper surveys 3D content generation literature using a taxonomy of asset types and production stages to evaluate progress toward engine-ready assets.
Reference graph
Works this paper leans on
-
[1]
Learning Representations and Generative Models for 3D Point Clouds
Panos Achlioptas, Olga Diamanti, Ioannis Mitliagkas, and Leonidas Guibas. Learning represen- tations and generative models for 3d point clouds. arXiv:1707.02392, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[2]
MusicLM: Generating Music From Text
Andrea Agostinelli, Timo I. Denk, Zalán Borsos, Jesse Engel, Mauro Verzetti, Antoine Caillon, Qingqing Huang, Aren Jansen, Adam Roberts, Marco Tagliasacchi, Matt Sharifi, Neil Zeghidour, and Christian Frank. Musiclm: Generating music from text, 2023. URL https://arxiv.org/ abs/2301.11325
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[3]
Sine: Semantic-driven image-based nerf editing with prior-guided editing field
Chong Bao, Yinda Zhang, Bangbang Yang, Tianxing Fan, Zesong Yang, Hujun Bao, Guofeng Zhang, and Zhaopeng Cui. Sine: Semantic-driven image-based nerf editing with prior-guided editing field. arXiv:2303.13277, 2023
-
[4]
Gaudi: A neural architect for immersive 3d scene generation
Miguel Angel Bautista, Pengsheng Guo, Samira Abnar, Walter Talbott, Alexander Toshev, Zhuoyuan Chen, Laurent Dinh, Shuangfei Zhai, Hanlin Goh, Daniel Ulbricht, Afshin De- hghan, and Josh Susskind. Gaudi: A neural architect for immersive 3d scene generation. arXiv:2207.13751, 2022
-
[5]
Audiolm: a language modeling approach to audio generation, 2022
Zalán Borsos, Raphaël Marinier, Damien Vincent, Eugene Kharitonov, Olivier Pietquin, Matt Sharifi, Olivier Teboul, David Grangier, Marco Tagliasacchi, and Neil Zeghidour. Audiolm: a language modeling approach to audio generation, 2022. URL https://arxiv.org/abs/ 2209.03143
-
[6]
Transformers as meta-learners for implicit neural representa- tions, 2022
Yinbo Chen and Xiaolong Wang. Transformers as meta-learners for implicit neural representa- tions, 2022. URL https://arxiv.org/abs/2208.02801
-
[7]
Perception prioritized training of diffusion models, 2022
Jooyoung Choi, Jungbeom Lee, Chaehun Shin, Sungwon Kim, Hyunwoo Kim, and Sungroh Yoon. Perception prioritized training of diffusion models, 2022
work page 2022
- [8]
-
[9]
Diffusion Models Beat GANs on Image Synthesis
Prafulla Dhariwal and Alex Nichol. Diffusion models beat gans on image synthesis. arXiv:2105.05233, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[10]
Jukebox: A Generative Model for Music
Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, and Ilya Sutskever. Jukebox: A generative model for music. arXiv:2005.00341, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2005
-
[11]
An efficient method of triangulating equi-valued surfaces by using tetrahedral cells
Akio Doi and Akio Koide. An efficient method of triangulating equi-valued surfaces by using tetrahedral cells. IEICE Transactions on Information and Systems, 74:214–224, 1991
work page 1991
- [12]
-
[13]
Conor Durkan, Artur Bekasov, Iain Murray, and George Papamakarios. Neural spline flows. arXiv:1906.04032, 2019
-
[14]
Hyperdiffusion: Generating implicit neural fields with weight-space diffusion, 2023
Ziya Erkoç, Fangchang Ma, Qi Shan, Matthias Nießner, and Angela Dai. Hyperdiffusion: Generating implicit neural fields with weight-space diffusion, 2023
work page 2023
-
[15]
Zhida Feng, Zhenyu Zhang, Xintong Yu, Yewei Fang, Lanxin Li, Xuyi Chen, Yuxiang Lu, Jiaxiang Liu, Weichong Yin, Shikun Feng, Yu Sun, Hao Tian, Hua Wu, and Haifeng Wang. Ernie-vilg 2.0: Improving text-to-image diffusion model with knowledge-enhanced mixture-of- denoising-experts. arXiv:2210.15257, 2022
-
[16]
Shapecrafter: A recursive text-conditioned 3d shape generation model
Rao Fu, Xiao Zhan, Yiwen Chen, Daniel Ritchie, and Srinath Sridhar. Shapecrafter: A recursive text-conditioned 3d shape generation model. arXiv:2207.09446, 2022
-
[17]
Make-a-scene: Scene-based text-to-image generation with human priors
Oran Gafni, Adam Polyak, Oron Ashual, Shelly Sheynin, Devi Parikh, and Yaniv Taigman. Make-a-scene: Scene-based text-to-image generation with human priors. arXiv:2203.13131, 2022
-
[18]
Get3d: A generative model of high quality 3d textured shapes learned from images
Jun Gao, Tianchang Shen, Zian Wang, Wenzheng Chen, Kangxue Yin, Daiqing Li, Or Litany, Zan Gojcic, and Sanja Fidler. Get3d: A generative model of high quality 3d textured shapes learned from images. arXiv:2209.11163, 2022
-
[19]
Generative Adversarial Networks
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks.arXiv:1406.2661, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[20]
Gaussian Error Linear Units (GELUs)
Dan Hendrycks and Kevin Gimpel. Gaussian error linear units (gelus). arXiv:1606.08415, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[21]
Classifier-free diffusion guidance
Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021. URL https://openreview. net/forum?id=qw8AKxfYbI
work page 2021
-
[22]
Denoising Diffusion Probabilistic Models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. arXiv:2006.11239, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2006
-
[23]
Imagen Video: High Definition Video Generation with Diffusion Models
Jonathan Ho, William Chan, Chitwan Saharia, Jay Whang, Ruiqi Gao, Alexey Gritsenko, Diederik P. Kingma, Ben Poole, Mohammad Norouzi, David J. Fleet, and Tim Salimans. Imagen video: High definition video generation with diffusion models. arXiv:2210.02303, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[24]
Jonathan Ho, Tim Salimans, Alexey Gritsenko, William Chan, Mohammad Norouzi, and David J. Fleet. Video diffusion models. arXiv:2204.03458, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[25]
Qingqing Huang, Daniel S. Park, Tao Wang, Timo I. Denk, Andy Ly, Nanxin Chen, Zhengdong Zhang, Zhishuai Zhang, Jiahui Yu, Christian Frank, Jesse Engel, Quoc V . Le, William Chan, Zhifeng Chen, and Wei Han. Noise2music: Text-conditioned music generation with diffusion models, 2023. URL https://arxiv.org/abs/2302.03917
-
[26]
Barron, Pieter Abbeel, and Ben Poole
Ajay Jain, Ben Mildenhall, Jonathan T. Barron, Pieter Abbeel, and Ben Poole. Zero-shot text-guided object generation with dream fields. arXiv:2112.01455, 2021
-
[27]
Elucidating the Design Space of Diffusion-Based Generative Models
Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. arXiv:2206.00364, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[28]
Clip-mesh: Generating textured meshes from text using pretrained image-text models
Nasir Mohammad Khalid, Tianhao Xie, Eugene Belilovsky, and Tiberiu Popa. Clip-mesh: Generating textured meshes from text using pretrained image-text models. arXiv:2203.13333, 2022
-
[29]
Adam: A Method for Stochastic Optimization
Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[30]
NeRF-V AE: A geometry aware 3D scene generative model
Adam R Kosiorek, Heiko Strathmann, Daniel Zoran, Pol Moreno, Rosalia Schneider, So ˇna Mokrá, and Danilo J Rezende. NeRF-V AE: A geometry aware 3D scene generative model. arXiv:2104.00587, April 2021. 15
-
[31]
Audiogen: Textually guided audio generation,
Felix Kreuk, Gabriel Synnaeve, Adam Polyak, Uriel Singer, Alexandre Défossez, Jade Copet, Devi Parikh, Yaniv Taigman, and Yossi Adi. Audiogen: Textually guided audio generation,
- [32]
-
[33]
Modular primitives for high-performance differentiable rendering
Samuli Laine, Janne Hellsten, Tero Karras, Yeongho Seol, Jaakko Lehtinen, and Timo Aila. Modular primitives for high-performance differentiable rendering. arXiv:2011.03277, 2020
-
[34]
Magic3d: High-resolution text-to-3d content creation
Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. Magic3d: High-resolution text-to-3d content creation. arXiv:2211.10440, 2022
-
[35]
Towards implicit text-guided 3d shape generation
Zhengzhe Liu, Yi Wang, Xiaojuan Qi, and Chi-Wing Fu. Towards implicit text-guided 3d shape generation. arXiv:2203.14622, 2022
-
[36]
William E. Lorensen and Harvey E. Cline. Marching cubes: A high resolution 3d surface construction algorithm. In Maureen C. Stone, editor, SIGGRAPH, pages 163–169. ACM,
-
[37]
URL http://dblp.uni-trier.de/db/conf/siggraph/ siggraph1987.html#LorensenC87
ISBN 0-89791-227-6. URL http://dblp.uni-trier.de/db/conf/siggraph/ siggraph1987.html#LorensenC87
-
[38]
Diffusion probabilistic models for 3d point cloud generation
Shitong Luo and Wei Hu. Diffusion probabilistic models for 3d point cloud generation. arXiv:2103.01458, 2021
-
[39]
Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, and Hao Wu. Mixed precision training. arXiv:1710.03740, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[40]
Srinivasan, Matthew Tancik, Jonathan T
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoor- thi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. arXiv:2003.08934, 2020
-
[41]
Improved Denoising Diffusion Probabilistic Models
Alex Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. arXiv:2102.09672, 2021
work page internal anchor Pith review arXiv 2021
-
[42]
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv:2112.10741, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[43]
Point-E: A System for Generating 3D Point Clouds from Complex Prompts
Alex Nichol, Heewoo Jun, Prafulla Dhariwal, Pamela Mishkin, and Mark Chen. Point-e: A system for generating 3d point clouds from complex prompts. arXiv:2212.08751, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[44]
Bench- mark for compositional text-to-image synthesis
Dong Huk Park, Samaneh Azadi, Xihui Liu, Trevor Darrell, and Anna Rohrbach. Bench- mark for compositional text-to-image synthesis. In Thirty-fifth Conference on Neural In- formation Processing Systems Datasets and Benchmarks Track (Round 1) , 2021. URL https://openreview.net/forum?id=bKBhQhPeKaF
work page 2021
-
[45]
DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation
Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Love- grove. Deepsdf: Learning continuous signed distance functions for shape representation. arXiv:1901.05103, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1901
-
[46]
Christine Payne. Musenet. OpenAI blog, 2019. URL https://openai.com/blog/musenet
work page 2019
-
[47]
DreamFusion: Text-to-3D using 2D Diffusion
Ben Poole, Ajay Jain, Jonathan T. Barron, and Ben Mildenhall. Dreamfusion: Text-to-3d using 2d diffusion. arXiv:2209.14988, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[48]
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. arXiv:2103.00020, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[49]
Zero-Shot Text-to-Image Generation
Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea V oss, Alec Radford, Mark Chen, and Ilya Sutskever. Zero-shot text-to-image generation. arXiv:2102.12092, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[50]
Hierarchical Text-Conditional Image Generation with CLIP Latents
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image generation with clip latents. arXiv:2204.06125, 2022. 16
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[51]
Accelerating 3D Deep Learning with PyTorch3D
Nikhila Ravi, Jeremy Reizenstein, David Novotny, Taylor Gordon, Wan-Yen Lo, Justin Johnson, and Georgia Gkioxari. Accelerating 3d deep learning with pytorch3d. arXiv:2007.08501, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2007
-
[52]
Generating Diverse High-Fidelity Images with VQ-VAE-2
Ali Razavi, Aaron van den Oord, and Oriol Vinyals. Generating diverse high-fidelity images with VQ-V AE-2.arXiv:1906.00446, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1906
-
[53]
Variational Inference with Normalizing Flows
Danilo Jimenez Rezende and Shakir Mohamed. Variational inference with normalizing flows. arXiv:1505.05770, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[54]
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. arXiv:2112.10752, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[55]
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, and Mohammad Norouzi. Photorealistic text-to-image diffusion models with deep language understanding. arXiv:2205.11487, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[56]
Lambourne, Ye Wang, Chin-Yi Cheng, Marco Fumero, and Kamal Rahimi Malekshan
Aditya Sanghi, Hang Chu, Joseph G. Lambourne, Ye Wang, Chin-Yi Cheng, Marco Fumero, and Kamal Rahimi Malekshan. Clip-forge: Towards zero-shot text-to-shape generation. arXiv:2110.02624, 2021
-
[57]
Textcraft: Zero-shot generation of high-fidelity and diverse shapes from text
Aditya Sanghi, Rao Fu, Vivian Liu, Karl Willis, Hooman Shayani, Amir Hosein Khasahmadi, Srinath Sridhar, and Daniel Ritchie. Textcraft: Zero-shot generation of high-fidelity and diverse shapes from text. arXiv:2211.01427, 2022
-
[58]
Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis
Tianchang Shen, Jun Gao, Kangxue Yin, Ming-Yu Liu, and Sanja Fidler. Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis. arXiv:2111.04276, 2021
-
[59]
Make-A-Video: Text-to-Video Generation without Text-Video Data
Uriel Singer, Adam Polyak, Thomas Hayes, Xi Yin, Jie An, Songyang Zhang, Qiyuan Hu, Harry Yang, Oron Ashual, Oran Gafni, Devi Parikh, Sonal Gupta, and Yaniv Taigman. Make-a-video: Text-to-video generation without text-video data. arXiv:2209.14792, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[60]
Deep Unsupervised Learning using Nonequilibrium Thermodynamics
Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. arXiv:1503.03585, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[61]
Denoising Diffusion Implicit Models
Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. arXiv:2010.02502, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[62]
Generative modeling by estimating gradients of the data distribution
Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. arXiv:arXiv:1907.05600, 2020
-
[63]
Score-Based Generative Modeling through Stochastic Differential Equations
Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. arXiv:2011.13456, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2011
-
[64]
Marching cubes 33: Construction of topologically correct isosurfaces
Evgueni Tcherniaev. Marching cubes 33: Construction of topologically correct isosurfaces. 01 1996
work page 1996
-
[65]
Neural Discrete Representation Learning
Aaron van den Oord, Oriol Vinyals, and Koray Kavukcuoglu. Neural discrete representation learning. arXiv:1711.00937, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[66]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. arXiv:1706.03762, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[67]
Haochen Wang, Xiaodan Du, Jiahao Li, Raymond A. Yeh, and Greg Shakhnarovich. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. arXiv:2212.00774, 2022
-
[68]
Rodin: A generative model for sculpting 3d digital avatars using diffusion, 2022
Tengfei Wang, Bo Zhang, Ting Zhang, Shuyang Gu, Jianmin Bao, Tadas Baltrusaitis, Jingjing Shen, Dong Chen, Fang Wen, Qifeng Chen, and Baining Guo. Rodin: A generative model for sculpting 3d digital avatars using diffusion, 2022. URL https://arxiv.org/abs/2212. 06135. 17
work page 2022
-
[69]
Novel view synthesis with diffusion models
Daniel Watson, William Chan, Ricardo Martin-Brualla, Jonathan Ho, Andrea Tagliasacchi, and Mohammad Norouzi. Novel view synthesis with diffusion models. arXiv:2210.04628, 2022
-
[70]
The emergence of deepfake technology: A review
Mika Westerlund. The emergence of deepfake technology: A review. Technology Innovation Management Review, 9:40–53, 11/2019 2019. ISSN 1927-0321. doi: http://doi.org/10.22215/ timreview/1282. URL timreview.ca/article/1282
work page 2019
-
[71]
Pointflow: 3d point cloud generation with continuous normalizing flows
Guandao Yang, Xun Huang, Zekun Hao, Ming-Yu Liu, Serge Belongie, and Bharath Hariharan. Pointflow: 3d point cloud generation with continuous normalizing flows. arXiv:1906.12320, 2019
-
[72]
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Jiahui Yu, Yuanzhong Xu, Jing Yu Koh, Thang Luong, Gunjan Baid, Zirui Wang, Vijay Vasudevan, Alexander Ku, Yinfei Yang, Burcu Karagol Ayan, Ben Hutchinson, Wei Han, Zarana Parekh, Xin Li, Han Zhang, Jason Baldridge, and Yonghui Wu. Scaling autoregressive models for content-rich text-to-image generation. arXiv:2206.10789, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[73]
Lion: Latent point diffusion models for 3d shape generation
Xiaohui Zeng, Arash Vahdat, Francis Williams, Zan Gojcic, Or Litany, Sanja Fidler, and Karsten Kreis. Lion: Latent point diffusion models for 3d shape generation. arXiv:2210.06978, 2022
-
[74]
Kai Zhang, Nick Kolkin, Sai Bi, Fujun Luan, Zexiang Xu, Eli Shechtman, and Noah Snavely. Arf: Artistic radiance fields. arXiv:2206.06360, 2022. 18 Algorithm 1 High-level pseudocode of our encoder architecture. Input point cloudp, multiview point cloudm, learned input embedding sequencehl. Outputs: latent variableh and MLP parametersθ. 1: h ← Cat([PointConv...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.