Introduction to Shap-E: Text-to-3D

5 min readMay 5, 2023

Generate 3D objects conditioned on text or images

By reading this article, you will learn to use Shap-E for 3D objects generation. As of 5 May 2023, OpenAI officially released Shap-E, which is

a system to generate 3D objects conditioned on text or images

Similar to its predecessor Point-E, Shap-E is capable of generating coherent 3D objects when conditioned on a rendering from a single viewpoint (image) or text prompt directly.

Shap-E contains the following models:

encoder — converts 3D assets into the parameters of small neural networks which represent the 3D shape and texture as an implicit function. The resulting implicit function can be rendered from arbitrary viewpoints or imported into downstream applications as a mesh.
latent diffusion — generates novel implicit functions conditioned on either images or text descriptions. It produce latents which must be linearly projected to get the final implicit function parameters.

Both models are trained on the same datasets as Point-E with the following improvements:

rendering is based on 60 views of each model when computing point clouds. Previously, Point-E used 20 views and the final output is prone to small cracks

Introduction to Shap-E: Text-to-3D

Written by Ng Wai Foong