arxiv: 2305.02463 · v1 · submitted 2023-05-03 · 💻 cs.CV · cs.LG

Recognition: 2 theorem links

Shap-E: Generating Conditional 3D Implicit Functions

Heewoo Jun , Alex Nichol

Authors on Pith no claims yet

Pith reviewed 2026-05-16 15:27 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords text-to-3Dimplicit functionsdiffusion modelsneural radiance fields3D generationconditional generative modelsmeshes

0 comments

The pith

Shap-E generates parameters of implicit functions for 3D assets directly from text prompts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Shap-E as a conditional generative model that outputs the parameters of implicit functions rather than a single fixed representation. These parameters support rendering as both textured meshes and neural radiance fields. Training proceeds in two stages: an encoder first maps existing 3D assets deterministically into the implicit-function parameter space, after which a conditional diffusion model learns to generate new parameter sets from text or other conditions. The resulting system produces complex and diverse 3D assets in seconds when trained on large paired text-3D datasets and matches or exceeds the quality of prior explicit point-cloud generators while handling a higher-dimensional output space.

Core claim

Shap-E trains an encoder that deterministically maps 3D assets into the parameters of an implicit function, then trains a conditional diffusion model on those encoded outputs so that text conditions can produce new implicit-function parameters that render as both textured meshes and neural radiance fields.

What carries the argument

A two-stage pipeline consisting of a deterministic encoder that converts 3D assets into implicit-function parameters, followed by a conditional diffusion model that generates new parameters in that space.

If this is right

A single trained model produces both textured meshes and neural radiance fields from the same implicit-function parameters.
Generation completes in seconds once the model is trained on paired 3D and text data.
Sample quality reaches or exceeds that of explicit point-cloud generators while operating in a higher-dimensional multi-representation space.
The diffusion process runs entirely in the compressed parameter space produced by the encoder.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same parameter space might support conditioning on images or sketches in addition to text without retraining the encoder.
Because the output is an implicit function, downstream tasks such as collision detection or ray-tracing can operate directly on the generated representation.
Scaling the encoder to larger or more diverse 3D datasets could reduce the information loss that currently limits the diffusion stage.

Load-bearing premise

The encoder can map arbitrary 3D assets into implicit-function parameters with negligible information loss so the diffusion model sees a faithful latent space.

What would settle it

Observe whether text-conditioned outputs consistently produce 3D assets whose rendered views match the prompt description and exhibit visual diversity across different samplings.

read the original abstract

We present Shap-E, a conditional generative model for 3D assets. Unlike recent work on 3D generative models which produce a single output representation, Shap-E directly generates the parameters of implicit functions that can be rendered as both textured meshes and neural radiance fields. We train Shap-E in two stages: first, we train an encoder that deterministically maps 3D assets into the parameters of an implicit function; second, we train a conditional diffusion model on outputs of the encoder. When trained on a large dataset of paired 3D and text data, our resulting models are capable of generating complex and diverse 3D assets in a matter of seconds. When compared to Point-E, an explicit generative model over point clouds, Shap-E converges faster and reaches comparable or better sample quality despite modeling a higher-dimensional, multi-representation output space. We release model weights, inference code, and samples at https://github.com/openai/shap-e.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Shap-E swaps Point-E's point clouds for implicit function parameters via a two-stage encoder-diffusion pipeline, and the released model delivers usable text-to-3D outputs for meshes and NeRFs with faster convergence.

read the letter

Shap-E takes the Point-E setup and targets the parameters of an implicit function instead of explicit points. The first stage trains an encoder that turns 3D assets into those parameters; the second stage runs a conditional diffusion model on the encoder outputs. This produces assets that can be extracted as textured meshes or rendered as radiance fields, all from text. The head-to-head numbers show faster convergence than Point-E and quality that is at least comparable, even though the output space is higher-dimensional. Releasing the weights, inference code, and samples makes the claims easy to check directly.

Referee Report

0 major / 3 minor

Summary. The paper presents Shap-E, a conditional generative model for 3D assets that directly outputs parameters of implicit functions renderable as both textured meshes and neural radiance fields. Training proceeds in two stages: an encoder deterministically maps 3D assets to implicit-function parameters, after which a conditional diffusion model is trained on the encoder outputs using paired 3D-text data. The resulting models generate complex, diverse 3D assets in seconds and are reported to converge faster while achieving comparable or better sample quality than the Point-E baseline despite operating in a higher-dimensional, multi-representation output space. Model weights, inference code, and samples are released.

Significance. If the empirical claims hold, the work provides a practical advance in 3D generative modeling by producing implicit representations that support multiple downstream renderings. The two-stage architecture with publicly released implementation enables direct empirical verification of reconstruction fidelity and generation quality, strengthening reproducibility.

minor comments (3)

[Abstract] Abstract: the phrase 'a large dataset of paired 3D and text data' would be strengthened by reporting the approximate number of assets or total training tokens to allow readers to gauge scale.
[Results] Results section: the comparison tables would benefit from an additional column or row reporting reconstruction PSNR or IoU for the encoder stage alone, to quantify information loss before the diffusion stage.
[Method] The implicit-function parameterization (Eq. 1 or equivalent) could include an explicit statement of the number of output channels or basis functions used for texture, to clarify the dimensionality increase relative to Point-E.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their accurate summary of Shap-E, for highlighting the significance of the two-stage training procedure and multi-representation output, and for recommending acceptance. We are pleased that the practical advantages and reproducibility aspects were noted.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents an empirical two-stage pipeline: an encoder that maps 3D assets to implicit-function parameters, followed by a conditional diffusion model trained on those encoder outputs using external paired 3D-text data. Performance claims rest on direct training and comparison against the independently published Point-E baseline rather than any internal equation that reduces a reported metric to a quantity defined by the authors' own fitted constants or self-citation chain. No load-bearing step matches the enumerated circularity patterns; the derivation is self-contained against external benchmarks and released code.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the standard assumption that implicit functions can faithfully represent the training assets and that diffusion models can learn the distribution of their parameters; no new physical entities or ad-hoc constants are introduced beyond ordinary neural-network hyperparameters.

free parameters (1)

diffusion model training hyperparameters
Standard learning-rate, noise schedule, and architecture choices fitted during optimization on the encoder outputs.

axioms (1)

domain assumption An encoder can map 3D assets into implicit-function parameters with negligible loss of geometric and textural information
Invoked in the first training stage described in the abstract.

pith-pipeline@v0.9.0 · 5457 in / 1233 out tokens · 28207 ms · 2026-05-16T15:27:46.007085+00:00 · methodology

discussion (0)

Forward citations

Cited by 20 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

ReConText3D: Replay-based Continual Text-to-3D Generation
cs.CV 2026-04 conditional novelty 8.0

ReConText3D is the first replay-memory framework for continual text-to-3D generation that prevents catastrophic forgetting on new textual categories while preserving quality on previously seen classes.
Co-generation of Layout and Shape from Text via Autoregressive 3D Diffusion
cs.CV 2026-04 unverdicted novelty 7.0

3D-ARD+ unifies autoregressive token prediction with diffusion-based 3D latent generation to co-produce indoor scene layouts and object geometries that follow complex text-specified spatial and semantic constraints.
VecSet-Edit: Unleashing Pre-trained LRM for Mesh Editing from Single Image
cs.CV 2026-02 unverdicted novelty 7.0

VecSet-Edit is the first method to perform high-fidelity mesh editing from a single image by analyzing and manipulating spatial token subsets in a pre-trained VecSet LRM.
Affostruction: 3D Affordance Grounding with Generative Reconstruction
cs.CV 2026-01 unverdicted novelty 7.0

Affostruction reconstructs full 3D object geometry from partial RGBD views and grounds text-based affordances on both visible and unobserved surfaces, reporting large gains over prior methods.
Structured 3D Latents for Scalable and Versatile 3D Generation
cs.CV 2024-12 unverdicted novelty 7.0

SLAT provides a unified 3D latent representation enabling versatile high-quality generation across multiple output formats from text or image inputs.
LRM: Large Reconstruction Model for Single Image to 3D
cs.CV 2023-11 conditional novelty 7.0

LRM is a large transformer that predicts a NeRF directly from a single image after training on a million-object multi-view dataset.
DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation
cs.CV 2023-09 unverdicted novelty 7.0

DreamGaussian creates high-quality textured 3D meshes from single-view images in 2 minutes via generative Gaussian Splatting with mesh extraction and UV refinement.
REVIVE 3D: Refinement via Encoded Voluminous Inflated prior for Volume Enhancement
cs.CV 2026-04 unverdicted novelty 6.0

REVIVE 3D generates voluminous 3D assets from flat 2D images via an inflated prior construction followed by latent-space refinement, plus new metrics for volume and flatness validated by user study.
Point-MF: One-step Point Cloud Generation from a Single Image via Mean Flows
cs.CV 2026-04 unverdicted novelty 6.0

Point-MF performs one-step point cloud reconstruction from single images by learning a mean velocity field in point space with a tailored Diffusion Transformer and a new auxiliary loss.
Sculpt4D: Generating 4D Shapes via Sparse-Attention Diffusion Transformers
cs.CV 2026-04 unverdicted novelty 6.0

Sculpt4D generates temporally coherent 4D shapes by integrating a block sparse attention mechanism with time-decaying mask into a pretrained 3D diffusion transformer, achieving SOTA results with 56% less computation.
SIC3D: Style Image Conditioned Text-to-3D Gaussian Splatting Generation
cs.CV 2026-04 unverdicted novelty 6.0

SIC3D generates text-to-3D objects with Gaussian splatting then stylizes them using Variational Stylized Score Distillation loss plus scaling regularization to improve style match and geometry fidelity.
TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models
cs.CV 2025-02 unverdicted novelty 6.0

TripoSG generates high-fidelity 3D meshes from input images via a large-scale rectified flow transformer and hybrid-trained 3D VAE on a custom 2-million-sample dataset, claiming state-of-the-art fidelity and generalization.
InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models
cs.CV 2024-04 unverdicted novelty 6.0

InstantMesh produces diverse, high-quality 3D meshes from single images in seconds by combining a multi-view diffusion model with a sparse-view large reconstruction model and optimizing directly on meshes.
SyncDreamer: Generating Multiview-consistent Images from a Single-view Image
cs.CV 2023-09 unverdicted novelty 6.0

SyncDreamer produces multiview-consistent images from a single input image by jointly modeling their distribution and synchronizing intermediate diffusion states via 3D-aware attention.
MVDream: Multi-view Diffusion for 3D Generation
cs.CV 2023-08 conditional novelty 6.0

MVDream is a multi-view diffusion model that functions as a generalizable 3D prior, enabling more consistent text-to-3D generation and few-shot 3D concept learning from 2D examples.
SpatialPrompt: XR-Based Spatial Intent Expression as Executable Constraints for AI Generative 3D Design
cs.HC 2026-05 unverdicted novelty 5.0

SpatialPrompt turns spatial sketches and voice prompts into executable constraints for controllable AI 3D generation in XR, enabling iterative collaborative creation with color-coded contributions.
From Visual Synthesis to Interactive Worlds: Toward Production-Ready 3D Asset Generation
cs.GR 2026-04 unverdicted novelty 5.0

The paper surveys 3D asset generation methods and organizes them around the full production pipeline to assess which outputs meet engine-level requirements for interactive applications.
Unposed-to-3D: Learning Simulation-Ready Vehicles from Real-World Images
cs.CV 2026-04 unverdicted novelty 5.0

Unposed-to-3D learns simulation-ready 3D vehicle models from unposed real images by predicting camera parameters for photometric self-supervision, then adding scale prediction and harmonization.
MOC-3D: Manifold-Order Consistency for Text-to-3D Generation
cs.CV 2026-05 unverdicted novelty 4.0

MOC-3D adds a semantic view-order constraint using CLIP monotonicity and a manifold-based feature continuity module on SPD Riemannian space to reduce macro-topological and micro-geometric inconsistencies in SDS-based ...
From Visual Synthesis to Interactive Worlds: Toward Production-Ready 3D Asset Generation
cs.GR 2026-04 unverdicted novelty 4.0

The paper surveys 3D content generation literature using a taxonomy of asset types and production stages to evaluate progress toward engine-ready assets.

Reference graph

Works this paper leans on

74 extracted references · 74 canonical work pages · cited by 19 Pith papers · 32 internal anchors

[1]

Learning Representations and Generative Models for 3D Point Clouds

Panos Achlioptas, Olga Diamanti, Ioannis Mitliagkas, and Leonidas Guibas. Learning represen- tations and generative models for 3d point clouds. arXiv:1707.02392, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[2]

MusicLM: Generating Music From Text

Andrea Agostinelli, Timo I. Denk, Zalán Borsos, Jesse Engel, Mauro Verzetti, Antoine Caillon, Qingqing Huang, Aren Jansen, Adam Roberts, Marco Tagliasacchi, Matt Shariﬁ, Neil Zeghidour, and Christian Frank. Musiclm: Generating music from text, 2023. URL https://arxiv.org/ abs/2301.11325

work page internal anchor Pith review Pith/arXiv arXiv 2023
[3]

Sine: Semantic-driven image-based nerf editing with prior-guided editing ﬁeld

Chong Bao, Yinda Zhang, Bangbang Yang, Tianxing Fan, Zesong Yang, Hujun Bao, Guofeng Zhang, and Zhaopeng Cui. Sine: Semantic-driven image-based nerf editing with prior-guided editing ﬁeld. arXiv:2303.13277, 2023

work page arXiv 2023
[4]

Gaudi: A neural architect for immersive 3d scene generation

Miguel Angel Bautista, Pengsheng Guo, Samira Abnar, Walter Talbott, Alexander Toshev, Zhuoyuan Chen, Laurent Dinh, Shuangfei Zhai, Hanlin Goh, Daniel Ulbricht, Afshin De- hghan, and Josh Susskind. Gaudi: A neural architect for immersive 3d scene generation. arXiv:2207.13751, 2022

work page arXiv 2022
[5]

Audiolm: a language modeling approach to audio generation, 2022

Zalán Borsos, Raphaël Marinier, Damien Vincent, Eugene Kharitonov, Olivier Pietquin, Matt Shariﬁ, Olivier Teboul, David Grangier, Marco Tagliasacchi, and Neil Zeghidour. Audiolm: a language modeling approach to audio generation, 2022. URL https://arxiv.org/abs/ 2209.03143

work page arXiv 2022
[6]

Transformers as meta-learners for implicit neural representa- tions, 2022

Yinbo Chen and Xiaolong Wang. Transformers as meta-learners for implicit neural representa- tions, 2022. URL https://arxiv.org/abs/2208.02801

work page arXiv 2022
[7]

Perception prioritized training of diffusion models, 2022

Jooyoung Choi, Jungbeom Lee, Chaehun Shin, Sungwon Kim, Hyunwoo Kim, and Sungroh Yoon. Perception prioritized training of diffusion models, 2022

work page 2022
[8]

A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1): 1–38, 1977. ISSN 00359246. URL http://www.jstor.org/stable/2984875

work page arXiv 1977
[9]

Diffusion Models Beat GANs on Image Synthesis

Prafulla Dhariwal and Alex Nichol. Diffusion models beat gans on image synthesis. arXiv:2105.05233, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[10]

Jukebox: A Generative Model for Music

Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, and Ilya Sutskever. Jukebox: A generative model for music. arXiv:2005.00341, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2005
[11]

An efﬁcient method of triangulating equi-valued surfaces by using tetrahedral cells

Akio Doi and Akio Koide. An efﬁcient method of triangulating equi-valued surfaces by using tetrahedral cells. IEICE Transactions on Information and Systems, 74:214–224, 1991

work page 1991
[12]

Emilien Dupont, Hyunjik Kim, S. M. Ali Eslami, Danilo Rezende, and Dan Rosenbaum. From data to functa: Your data point is a function and you can treat it like one. arXiv:2201.12204, 2022. 14

work page arXiv 2022
[13]

Neural spline ﬂows

Conor Durkan, Artur Bekasov, Iain Murray, and George Papamakarios. Neural spline ﬂows. arXiv:1906.04032, 2019

work page arXiv 1906
[14]

Hyperdiffusion: Generating implicit neural ﬁelds with weight-space diffusion, 2023

Ziya Erkoç, Fangchang Ma, Qi Shan, Matthias Nießner, and Angela Dai. Hyperdiffusion: Generating implicit neural ﬁelds with weight-space diffusion, 2023

work page 2023
[15]

Ernie-vilg 2.0: Improving text-to-image diffusion model with knowledge-enhanced mixture-of- denoising-experts

Zhida Feng, Zhenyu Zhang, Xintong Yu, Yewei Fang, Lanxin Li, Xuyi Chen, Yuxiang Lu, Jiaxiang Liu, Weichong Yin, Shikun Feng, Yu Sun, Hao Tian, Hua Wu, and Haifeng Wang. Ernie-vilg 2.0: Improving text-to-image diffusion model with knowledge-enhanced mixture-of- denoising-experts. arXiv:2210.15257, 2022

work page arXiv 2022
[16]

Shapecrafter: A recursive text-conditioned 3d shape generation model

Rao Fu, Xiao Zhan, Yiwen Chen, Daniel Ritchie, and Srinath Sridhar. Shapecrafter: A recursive text-conditioned 3d shape generation model. arXiv:2207.09446, 2022

work page arXiv 2022
[17]

Make-a-scene: Scene-based text-to-image generation with human priors

Oran Gafni, Adam Polyak, Oron Ashual, Shelly Sheynin, Devi Parikh, and Yaniv Taigman. Make-a-scene: Scene-based text-to-image generation with human priors. arXiv:2203.13131, 2022

work page arXiv 2022
[18]

Get3d: A generative model of high quality 3d textured shapes learned from images

Jun Gao, Tianchang Shen, Zian Wang, Wenzheng Chen, Kangxue Yin, Daiqing Li, Or Litany, Zan Gojcic, and Sanja Fidler. Get3d: A generative model of high quality 3d textured shapes learned from images. arXiv:2209.11163, 2022

work page arXiv 2022
[19]

Generative Adversarial Networks

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks.arXiv:1406.2661, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[20]

Gaussian Error Linear Units (GELUs)

Dan Hendrycks and Kevin Gimpel. Gaussian error linear units (gelus). arXiv:1606.08415, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[21]

Classiﬁer-free diffusion guidance

Jonathan Ho and Tim Salimans. Classiﬁer-free diffusion guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021. URL https://openreview. net/forum?id=qw8AKxfYbI

work page 2021
[22]

Denoising Diffusion Probabilistic Models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. arXiv:2006.11239, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2006
[23]

Imagen Video: High Definition Video Generation with Diffusion Models

Jonathan Ho, William Chan, Chitwan Saharia, Jay Whang, Ruiqi Gao, Alexey Gritsenko, Diederik P. Kingma, Ben Poole, Mohammad Norouzi, David J. Fleet, and Tim Salimans. Imagen video: High deﬁnition video generation with diffusion models. arXiv:2210.02303, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[24]

Jonathan Ho, Tim Salimans, Alexey Gritsenko, William Chan, Mohammad Norouzi, and David J. Fleet. Video diffusion models. arXiv:2204.03458, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[25]

Park, Tao Wang, Timo I

Qingqing Huang, Daniel S. Park, Tao Wang, Timo I. Denk, Andy Ly, Nanxin Chen, Zhengdong Zhang, Zhishuai Zhang, Jiahui Yu, Christian Frank, Jesse Engel, Quoc V . Le, William Chan, Zhifeng Chen, and Wei Han. Noise2music: Text-conditioned music generation with diffusion models, 2023. URL https://arxiv.org/abs/2302.03917

work page arXiv 2023
[26]

Barron, Pieter Abbeel, and Ben Poole

Ajay Jain, Ben Mildenhall, Jonathan T. Barron, Pieter Abbeel, and Ben Poole. Zero-shot text-guided object generation with dream ﬁelds. arXiv:2112.01455, 2021

work page arXiv 2021
[27]

Elucidating the Design Space of Diffusion-Based Generative Models

Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. arXiv:2206.00364, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[28]

Clip-mesh: Generating textured meshes from text using pretrained image-text models

Nasir Mohammad Khalid, Tianhao Xie, Eugene Belilovsky, and Tiberiu Popa. Clip-mesh: Generating textured meshes from text using pretrained image-text models. arXiv:2203.13333, 2022

work page arXiv 2022
[29]

Adam: A Method for Stochastic Optimization

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[30]

NeRF-V AE: A geometry aware 3D scene generative model

Adam R Kosiorek, Heiko Strathmann, Daniel Zoran, Pol Moreno, Rosalia Schneider, So ˇna Mokrá, and Danilo J Rezende. NeRF-V AE: A geometry aware 3D scene generative model. arXiv:2104.00587, April 2021. 15

work page arXiv 2021
[31]

Audiogen: Textually guided audio generation,

Felix Kreuk, Gabriel Synnaeve, Adam Polyak, Uriel Singer, Alexandre Défossez, Jade Copet, Devi Parikh, Yaniv Taigman, and Yossi Adi. Audiogen: Textually guided audio generation,

work page
[32]

URL https://arxiv.org/abs/2209.15352

work page arXiv
[33]

Modular primitives for high-performance differentiable rendering

Samuli Laine, Janne Hellsten, Tero Karras, Yeongho Seol, Jaakko Lehtinen, and Timo Aila. Modular primitives for high-performance differentiable rendering. arXiv:2011.03277, 2020

work page arXiv 2011
[34]

Magic3d: High-resolution text-to-3d content creation

Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. Magic3d: High-resolution text-to-3d content creation. arXiv:2211.10440, 2022

work page arXiv 2022
[35]

Towards implicit text-guided 3d shape generation

Zhengzhe Liu, Yi Wang, Xiaojuan Qi, and Chi-Wing Fu. Towards implicit text-guided 3d shape generation. arXiv:2203.14622, 2022

work page arXiv 2022
[36]

Lorensen and Harvey E

William E. Lorensen and Harvey E. Cline. Marching cubes: A high resolution 3d surface construction algorithm. In Maureen C. Stone, editor, SIGGRAPH, pages 163–169. ACM,

work page
[37]

URL http://dblp.uni-trier.de/db/conf/siggraph/ siggraph1987.html#LorensenC87

ISBN 0-89791-227-6. URL http://dblp.uni-trier.de/db/conf/siggraph/ siggraph1987.html#LorensenC87

work page
[38]

Diffusion probabilistic models for 3d point cloud generation

Shitong Luo and Wei Hu. Diffusion probabilistic models for 3d point cloud generation. arXiv:2103.01458, 2021

work page arXiv 2021
[39]

Mixed Precision Training

Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, and Hao Wu. Mixed precision training. arXiv:1710.03740, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[40]

Srinivasan, Matthew Tancik, Jonathan T

Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoor- thi, and Ren Ng. Nerf: Representing scenes as neural radiance ﬁelds for view synthesis. arXiv:2003.08934, 2020

work page arXiv 2003
[41]

Improved Denoising Diffusion Probabilistic Models

Alex Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. arXiv:2102.09672, 2021

work page internal anchor Pith review arXiv 2021
[42]

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv:2112.10741, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[43]

Point-E: A System for Generating 3D Point Clouds from Complex Prompts

Alex Nichol, Heewoo Jun, Prafulla Dhariwal, Pamela Mishkin, and Mark Chen. Point-e: A system for generating 3d point clouds from complex prompts. arXiv:2212.08751, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[44]

Bench- mark for compositional text-to-image synthesis

Dong Huk Park, Samaneh Azadi, Xihui Liu, Trevor Darrell, and Anna Rohrbach. Bench- mark for compositional text-to-image synthesis. In Thirty-ﬁfth Conference on Neural In- formation Processing Systems Datasets and Benchmarks Track (Round 1) , 2021. URL https://openreview.net/forum?id=bKBhQhPeKaF

work page 2021
[45]

DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation

Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Love- grove. Deepsdf: Learning continuous signed distance functions for shape representation. arXiv:1901.05103, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901
[46]

Christine Payne. Musenet. OpenAI blog, 2019. URL https://openai.com/blog/musenet

work page 2019
[47]

DreamFusion: Text-to-3D using 2D Diffusion

Ben Poole, Ajay Jain, Jonathan T. Barron, and Ben Mildenhall. Dreamfusion: Text-to-3d using 2d diffusion. arXiv:2209.14988, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[48]

Learning Transferable Visual Models From Natural Language Supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. arXiv:2103.00020, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[49]

Zero-Shot Text-to-Image Generation

Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea V oss, Alec Radford, Mark Chen, and Ilya Sutskever. Zero-shot text-to-image generation. arXiv:2102.12092, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[50]

Hierarchical Text-Conditional Image Generation with CLIP Latents

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image generation with clip latents. arXiv:2204.06125, 2022. 16

work page internal anchor Pith review Pith/arXiv arXiv 2022
[51]

Accelerating 3D Deep Learning with PyTorch3D

Nikhila Ravi, Jeremy Reizenstein, David Novotny, Taylor Gordon, Wan-Yen Lo, Justin Johnson, and Georgia Gkioxari. Accelerating 3d deep learning with pytorch3d. arXiv:2007.08501, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2007
[52]

Generating Diverse High-Fidelity Images with VQ-VAE-2

Ali Razavi, Aaron van den Oord, and Oriol Vinyals. Generating diverse high-ﬁdelity images with VQ-V AE-2.arXiv:1906.00446, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1906
[53]

Variational Inference with Normalizing Flows

Danilo Jimenez Rezende and Shakir Mohamed. Variational inference with normalizing ﬂows. arXiv:1505.05770, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[54]

High-Resolution Image Synthesis with Latent Diffusion Models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. arXiv:2112.10752, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[55]

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, and Mohammad Norouzi. Photorealistic text-to-image diffusion models with deep language understanding. arXiv:2205.11487, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[56]

Lambourne, Ye Wang, Chin-Yi Cheng, Marco Fumero, and Kamal Rahimi Malekshan

Aditya Sanghi, Hang Chu, Joseph G. Lambourne, Ye Wang, Chin-Yi Cheng, Marco Fumero, and Kamal Rahimi Malekshan. Clip-forge: Towards zero-shot text-to-shape generation. arXiv:2110.02624, 2021

work page arXiv 2021
[57]

Textcraft: Zero-shot generation of high-ﬁdelity and diverse shapes from text

Aditya Sanghi, Rao Fu, Vivian Liu, Karl Willis, Hooman Shayani, Amir Hosein Khasahmadi, Srinath Sridhar, and Daniel Ritchie. Textcraft: Zero-shot generation of high-ﬁdelity and diverse shapes from text. arXiv:2211.01427, 2022

work page arXiv 2022
[58]

Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis

Tianchang Shen, Jun Gao, Kangxue Yin, Ming-Yu Liu, and Sanja Fidler. Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis. arXiv:2111.04276, 2021

work page arXiv 2021
[59]

Make-A-Video: Text-to-Video Generation without Text-Video Data

Uriel Singer, Adam Polyak, Thomas Hayes, Xi Yin, Jie An, Songyang Zhang, Qiyuan Hu, Harry Yang, Oron Ashual, Oran Gafni, Devi Parikh, Sonal Gupta, and Yaniv Taigman. Make-a-video: Text-to-video generation without text-video data. arXiv:2209.14792, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[60]

Deep Unsupervised Learning using Nonequilibrium Thermodynamics

Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. arXiv:1503.03585, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[61]

Denoising Diffusion Implicit Models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. arXiv:2010.02502, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[62]

Generative modeling by estimating gradients of the data distribution

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. arXiv:arXiv:1907.05600, 2020

work page arXiv 1907
[63]

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. arXiv:2011.13456, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2011
[64]

Marching cubes 33: Construction of topologically correct isosurfaces

Evgueni Tcherniaev. Marching cubes 33: Construction of topologically correct isosurfaces. 01 1996

work page 1996
[65]

Neural Discrete Representation Learning

Aaron van den Oord, Oriol Vinyals, and Koray Kavukcuoglu. Neural discrete representation learning. arXiv:1711.00937, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[66]

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. arXiv:1706.03762, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[67]

Yeh, and Greg Shakhnarovich

Haochen Wang, Xiaodan Du, Jiahao Li, Raymond A. Yeh, and Greg Shakhnarovich. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. arXiv:2212.00774, 2022

work page arXiv 2022
[68]

Rodin: A generative model for sculpting 3d digital avatars using diffusion, 2022

Tengfei Wang, Bo Zhang, Ting Zhang, Shuyang Gu, Jianmin Bao, Tadas Baltrusaitis, Jingjing Shen, Dong Chen, Fang Wen, Qifeng Chen, and Baining Guo. Rodin: A generative model for sculpting 3d digital avatars using diffusion, 2022. URL https://arxiv.org/abs/2212. 06135. 17

work page 2022
[69]

Novel view synthesis with diffusion models

Daniel Watson, William Chan, Ricardo Martin-Brualla, Jonathan Ho, Andrea Tagliasacchi, and Mohammad Norouzi. Novel view synthesis with diffusion models. arXiv:2210.04628, 2022

work page arXiv 2022
[70]

The emergence of deepfake technology: A review

Mika Westerlund. The emergence of deepfake technology: A review. Technology Innovation Management Review, 9:40–53, 11/2019 2019. ISSN 1927-0321. doi: http://doi.org/10.22215/ timreview/1282. URL timreview.ca/article/1282

work page 2019
[71]

Pointﬂow: 3d point cloud generation with continuous normalizing ﬂows

Guandao Yang, Xun Huang, Zekun Hao, Ming-Yu Liu, Serge Belongie, and Bharath Hariharan. Pointﬂow: 3d point cloud generation with continuous normalizing ﬂows. arXiv:1906.12320, 2019

work page arXiv 1906
[72]

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

Jiahui Yu, Yuanzhong Xu, Jing Yu Koh, Thang Luong, Gunjan Baid, Zirui Wang, Vijay Vasudevan, Alexander Ku, Yinfei Yang, Burcu Karagol Ayan, Ben Hutchinson, Wei Han, Zarana Parekh, Xin Li, Han Zhang, Jason Baldridge, and Yonghui Wu. Scaling autoregressive models for content-rich text-to-image generation. arXiv:2206.10789, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[73]

Lion: Latent point diffusion models for 3d shape generation

Xiaohui Zeng, Arash Vahdat, Francis Williams, Zan Gojcic, Or Litany, Sanja Fidler, and Karsten Kreis. Lion: Latent point diffusion models for 3d shape generation. arXiv:2210.06978, 2022

work page arXiv 2022
[74]

DeepFakes

Kai Zhang, Nick Kolkin, Sai Bi, Fujun Luan, Zexiang Xu, Eli Shechtman, and Noah Snavely. Arf: Artistic radiance ﬁelds. arXiv:2206.06360, 2022. 18 Algorithm 1 High-level pseudocode of our encoder architecture. Input point cloudp, multiview point cloudm, learned input embedding sequencehl. Outputs: latent variableh and MLP parametersθ. 1: h ← Cat([PointConv...

work page arXiv 2022