Diffusion models have recently become the de-facto approach for generative modeling in the 2D domain. However, extending diffusion models to 3D is challenging due to the difficulties in acquiring 3D ground truth data for training. On the other hand, 3D GANs that integrate implicit 3D representations into GANs have shown remarkable 3D-aware generation when trained only on single-view image datasets. However, 3D GANs do not provide straightforward ways to precisely control image synthesis. To address these challenges, We present Control3Diff, a 3D diffusion model that combines the strengths of diffusion models and 3D GANs for versatile, controllable 3D-aware image synthesis for single-view datasets. Control3Diff explicitly models the underlying latent distribution (optionally conditioned on external inputs), thus enabling direct control during the diffusion process. Moreover, our approach is general and applicable to any type of controlling input, allowing us to train it with the same diffusion objective without any auxiliary supervision. We validate the efficacy of Control3Diff on standard image generation benchmarks, including FFHQ, AFHQ, and ShapeNet, using various conditioning inputs such as images, sketches, and text prompts. Please see the project website (\url{https://jiataogu.me/control3diff}) for video comparisons.
翻译:扩散模型最近已成为二维域中生成模型的标准方法。然而,在获取用于训练的三维真实数据方面,将扩散模型扩展到三维领域是具有挑战性的。另一方面,将隐式三维表示集成到生成对抗网络中的三维生成对抗网络在仅用单视角图像数据集进行训练时展现出了显著的三维感知生成效果。然而,三维生成对抗网络并没有提供精确控制图像合成的简单方式。为了解决这些问题,我们提出了Control3Diff,这是一种三维扩散模型,它结合了扩散模型和三维生成对抗网络的优势,用于单视角数据集的多样化、可控的三维感知图像合成。Control3Diff明确地对潜在分布进行建模(可根据外部输入进行条件化),从而在扩散过程中实现直接控制。此外,我们的方法是通用的,适用于任何类型的控制输入,允许我们在没有任何辅助监督的情况下使用相同的扩散目标进行训练。我们利用图像、草图和文本提示等各种条件输入,在标准图像生成基准测试中验证了Control3Diff的有效性,包括FFHQ、AFHQ和ShapeNet。请参见项目网站(\url {https://jiataogu.me/control3diff})进行视频比较。