From text to the third dimension, OpenAI launches Shap-E for 3D modelling

Three-dimensional models in seconds, starting with a simple text input: this is the promise of Shap-E, the new technology launched by OpenAI, which allows the three-dimensional world to be explored in a completely new way.

June 1, 2023

Elizabeth Smith

OpenAI, a leading artificial intelligence innovation company created by Sam Altman, has once again surprised the technology world with the launch of its latest generative artificial intelligence model called Shap-E.

This new AI is able to generate realistic and diverse 3D objects from simple textual input, paving the way for a wide range of innovative applications. Let’s see how it works.

Table of Contents

Shap-E, perfect 3D models in seconds: here’s how it works

One of the distinguishing features of Shap-E is its ability to create high-quality 3D assets in seconds from simple text input provided by the user, just like ChatGPT.

The training process of Shap-E, a system or model, occurs in two distinct phases. In the first phase, a component called the ‘encoder’ is trained. The encoder’s task is to convert the input into a parameter representation using an implicit function.

In other words, the encoder takes natural language and transforms it into a set of parameters that describe the object in a specific, deterministic way. In the second stage, another component called the conditional diffusion model is trained.

This model uses the output generated by the encoder to learn and improve its generation capabilities. The conditional diffusion model uses a technique called ‘conditional diffusion’ to generate new outputs based on the parameters obtained by the encoder.

What makes Shap-E a pioneer in the field of 3D modelling is its basis on the visual field synthesis technique called Neural Radiance Fields (NeRF).

This approach, which has already attracted much attention in the field of computer vision, enables Shap-E to realistically synthesise the visual field and reconstruct detailed 3D models.

How Shap-E works

Using advanced deep learning algorithms, Shap-E is able to create realistic and detailed three-dimensional representations. Thus capturing the appearance and illumination of scenes in a highly accurate manner. NeRF technology allows Shap-E to overcome the limitations of traditional 3D rendering techniques, delivering stunning results that come ever closer to reality.

With the ability to translate textual input into complex 3D models, Shap-E revolutionises the way people can explore and create visual content. Thus opening up new creative possibilities and offering powerful tools for artists, designers and developers. Here is an example, directly from the OpenAI blog:

“An interesting feature of OpenAI’s Shap-E is that, although it models a multidimensional output space with advanced representations, Shap-E converges faster and produces samples of comparable or even better quality than Point-E.“

Although the 3D objects created may look pixelated and rough, it is possible to generate them with a single text. However, a current limitation is that Shap-E is only able to produce objects with a single object description and simple attributes. Therefore having difficulty finding multiple attributes.

How to use Shap-E

OpeAI does not provide detailed instructions on how to use SHAP-E. So users are encouraged to explore and experiment with the tool to fully understand how it works.

You can access SHAP-E for free on GitHub and use it directly on your own computer. Without the need for an OpenAI API. Once installed, it can be used immediately. Provided you are familiar with its configuration.

To get started, SHAP-E must be installed using the Python command ‘pip install -e’. A set of notebooks is available on GitHub for using SHAP-E:

“text-to-3d” is a notebook that allows you to generate a three-dimensional model using a text prompt;

“image-to-3d” is a notebook that transforms a two-dimensional image into a three-dimensional object;

“encode_model” is a notebook that takes a pre-existing 3D model and uses Blender to start from the model, render it again and create something new.

The application areas of Shap-E

If ChatGPT offers a myriad of applications for development, Shap-E covers many other area. Such as robotics, cartography, virtual and augmented reality and many more.

The ability to generate realistic and diverse 3D models quickly and efficiently offers numerous opportunities for the gaming, film and virtual reality experience industries.

Shap-E represents a significant breakthrough in 3D modelling and rendering. Thus allowing creative people to bring their ideas to life more quickly and expressively in the future.

Not only industry professionals will benefit from this technology. But also enthusiasts and independent artists will be able to harness the potential of Shap-E to create extraordinary virtual worlds.