A system for generating 3D point clouds from complex prompts

Although recent work on text-conditional 3D object generation has shown promising results, state-of-the-art methods typically require several GPU hours to produce a single sample. This is in contrast to state-of-the-art generative imaging models, which produce samples in a few seconds or minutes. In this paper, we explore an alternative method for 3D object generation that produces 3D models in just 1-2 minutes on a single GPU. Our method first generates a single synthetic view using a text-to-image diffusion model, and then produces a 3D point cloud using a second diffusion model that conditions the generated image. Although our method still does not reach the state of the art in terms of sample quality, it is one to two orders of magnitude faster to sample, offering a practical trade-off for some cases of use We release our pre-trained point cloud diffusion models, as well as code and evaluation models, at this https URL.

