Latent diffusion models for image generation have crossed a quality threshold which enabled them to achieve mass adoption. Recently, a series of works have made advancements towards replicating this success in the 3D domain, introducing techniques such as point cloud VAE, triplane representation, neural implicit surfaces and differentiable rendering based training. We take another step along this direction, combining these developments in a two-step pipeline consisting of 1) a triplane VAE which can learn latent representations of textured meshes and 2) a conditional diffusion model which generates the triplane features. For the first time this architecture allows conditional and unconditional generation of high quality textured or untextured 3D meshes across multiple diverse categories in a few seconds on a single GPU. It outperforms previous work substantially on image-conditioned and unconditional generation on mesh quality as well as texture generation. Furthermore, we demonstrate the scalability of our model to large datasets for increased quality and diversity. We will release our code and trained models.
翻译:最近,一系列工程在复制3D领域的成功方面取得了进步,引入了点云VAE、三线代表、神经隐含表面和不同的造影培训等技术。我们沿着这一方向又迈出了一步,将这些发展动态合并成两步管道,其中包括:(1)三线图像VAE,可以学习纹理服体的潜在表现,(2)有条件的传播模型,产生三线特征。这一结构首次允许在几秒钟内,在单一的GPU上,以有条件和无条件的方式生成不同类别的高纯度或无色的3D模具。它大大优于先前关于以网状质量和质素生成的成像化和无条件生成的工作。此外,我们还展示了我们模型在大型数据集上的可扩展性,以提高质量和多样性。我们将发布我们的代码和经过培训的模型。</s>