O

shap-e-img2img

Image/Videoby OpenAIΒ·Model page β†—

OpenAI's image-to-3D model that generates 3D representations conditioned on an input image using the Shap-E diffusion pipeline.

Share:

Model Card

Shap-E introduces a diffusion process that can generate a 3D image from a text prompt. It was introduced in Shap-E: Generating Conditional 3D Implicit Functions by Heewoo Jun and Alex Nichol from OpenAI.

Original repository of Shap-E can be found here: https://github.com/openai/shap-e.

The authors of Shap-E didn't author this model card. They provide a separate model card here.

Introduction

The abstract of the Shap-E paper:

We present Shap-E, a conditional generative model for 3D assets. Unlike recent work on 3D generative models which produce a single output representation, Shap-E directly generates the parameters of implicit functions that can be rendered as both textured meshes and neural radiance fields. We train Shap-E in two stages: first, we train an encoder that deterministically maps 3D assets into the parameters of an implicit function; second, we train a conditional diffusion model on outputs of the encoder. When trained on a large dataset of paired 3D and text data, our resulting models are capable of generating complex and diverse 3D assets in a matter of seconds. When compared to Point-E, an explicit generative model over point clouds, Shap-E converges faster and reaches comparable or better sample quality despite modeling a higher-dimensional, multi-representation output space. We release model weights, inference code, and samples at this https URL.

Released checkpoints

The authors released the following checkpoints:

Usage examples in 🧨 diffusers

First make sure you have installed all the dependencies:

pip install transformers accelerate -q
pip install git+https://github.com/huggingface/diffusers@@shap-ee

Once the dependencies are installed, use the code below:

import torch
from diffusers import ShapEImg2ImgPipeline
from diffusers.utils import export_to_gif, load_image


ckpt_id = "openai/shap-e-img2img"
pipe = ShapEImg2ImgPipeline.from_pretrained(repo).to("cuda")

img_url = "https://hf.co/datasets/diffusers/docs-images/resolve/main/shap-e/corgi.png"
image = load_image(img_url)


generator = torch.Generator(device="cuda").manual_seed(0)
batch_size = 4
guidance_scale = 3.0

images = pipe(
    image, 
    num_images_per_prompt=batch_size, 
    generator=generator, 
    guidance_scale=guidance_scale,
    num_inference_steps=64, 
    size=256, 
    output_type="pil"
).images

gif_path = export_to_gif(images, "corgi_sampled_3d.gif")

Results

        </td>
        <td align="center">
            
        </td align="center">
        <td align="center">
            
        </td>
    </tr>
    <tr>
        <td align="center">Reference corgi image in 2D</td>
        <td align="center">Sampled image in 3D (one)</td>
        <td align="center">Sampled image in 3D (two)</td>
    </tr>
 </tr> 
</tbody>

Training details

Refer to the original paper.

Known limitations and potential biases

Refer to the original model card.

Citation

@misc{jun2023shape,
      title={Shap-E: Generating Conditional 3D Implicit Functions}, 
      author={Heewoo Jun and Alex Nichol},
      year={2023},
      eprint={2305.02463},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Author
O
OpenAI
Organization Β· βœ“
openai
Details
Downloads821
Likes58
AccessOpen Source
Taskimage-to-image
Licensemit
Librarydiffusers
CreatedJul 4, 2023
UpdatedJul 20, 2023
View on Hugging Face
Get the full context.

Sign up to read complete case studies, access detailed metrics, and unlock all use cases.

shap-e-img2img β€” AI Model Details | Applied