Diffusion models have shown great promise for image generation, beating GANs in terms of generation diversity, with comparable image quality. However, their application to 3D shapes has been limited to point or voxel representations that can in practice not accurately represent a 3D surface. We propose a diffusion model for neural implicit representations of 3D shapes that operates in the latent space of an auto-decoder. This allows us to generate diverse and high quality 3D surfaces. We additionally show that we can condition our model on images or text to enable image-to-3D generation and text-to-3D generation using CLIP embeddings. Furthermore, adding noise to the latent codes of existing shapes allows us to explore shape variations.
翻译:传播模型显示了生成图像的巨大前景,在生成多样性方面击败了GAN,具有类似的图像质量。然而,它们对 3D 形状的应用仅限于指向或 voxel 表示,实际上无法准确代表 3D 表面。 我们提议了一个在自动解码器潜在空间运行的 3D 形状神经隐含表示的传播模型。 这使我们能够生成多样和高质量的 3D 表面。 我们还表明,我们可以用图像或文本来限定我们的模型,以便利用 CLIP 嵌入使图像到 3D 生成和文本到 3D 生成。 此外,在现有的形状的潜在代码中添加噪音,使我们能够探索形状的变异。