PODIA-3D: 使用保持姿态的文本到图像扩散进行跨大领域间适配的三维生成模型 (PODIA-3D: Domain Adaptation of 3D Generative Model Across Large Domain Gap Using Pose-Preserved Text-to-Image Diffusion)

Recently, significant advancements have been made in 3D generative models, however training these models across diverse domains is challenging and requires an huge amount of training data and knowledge of pose distribution. Text-guided domain adaptation methods have allowed the generator to be adapted to the target domains using text prompts, thereby obviating the need for assembling numerous data. Recently, DATID-3D presents impressive quality of samples in text-guided domain, preserving diversity in text by leveraging text-to-image diffusion. However, adapting 3D generators to domains with significant domain gaps from the source domain still remains challenging due to issues in current text-to-image diffusion models as following: 1) shape-pose trade-off in diffusion-based translation, 2) pose bias, and 3) instance bias in the target domain, resulting in inferior 3D shapes, low text-image correspondence, and low intra-domain diversity in the generated samples. To address these issues, we propose a novel pipeline called PODIA-3D, which uses pose-preserved text-to-image diffusion-based domain adaptation for 3D generative models. We construct a pose-preserved text-to-image diffusion model that allows the use of extremely high-level noise for significant domain changes. We also propose specialized-to-general sampling strategies to improve the details of the generated samples. Moreover, to overcome the instance bias, we introduce a text-guided debiasing method that improves intra-domain diversity. Consequently, our method successfully adapts 3D generators across significant domain gaps. Our qualitative results and user study demonstrates that our approach outperforms existing 3D text-guided domain adaptation methods in terms of text-image correspondence, realism, diversity of rendered images, and sense of depth of 3D shapes in the generated samples

翻译：最近，三维生成模型取得了显著的进展，但是训练这些模型跨越不同的领域具有挑战性，需要大量的训练数据和对姿态分布的了解。文本引导的领域适配方法使生成器可以使用文本提示适应目标领域，从而无需组装大量数据。最近的 DATID-3D 利用文本到图像扩散来保留文本的多样性，在文本引导的领域中呈现出印象深刻的样本质量。然而，将三维生成器适应于与源域具有显著领域差距的域仍然具有挑战性，由于目前的文本到图像扩散模型存在以下问题：1）扩散式翻译中的形状 - 姿态权衡，2）姿态偏差，以及3）目标域中的实例偏差，导致生成样本的三维形状劣质，文本 - 图像对应性差，以及生成样本中内部域多样性较低。为了解决这些问题，我们提出了一种新的流程，称为 PODIA-3D ，它使用保持姿态的文本到图像扩散进行三维生成模型的领域适配。我们构建了一个保持姿态的文本到图像扩散模型，可以允许使用极高级别的噪声进行显著的领域变化。我们还提出了专门到一般的采样策略，以改善生成样本的细节。此外，为了克服实例偏差，我们引入了一种文本引导的消除偏见方法，以提高域内多样性。因此，我们的方法成功地使三维生成器适应了显著的领域差距。我们的定性结果和用户研究表明，我们的方法在文本 - 图像对应性，逼真度，渲染图像的多样性以及生成样本的三维形状深度感方面优于现有的三维文本引导的领域适配方法。