Generating photos satisfying multiple constraints find broad utility in the content creation industry. A key hurdle to accomplishing this task is the need for paired data consisting of all modalities (i.e., constraints) and their corresponding output. Moreover, existing methods need retraining using paired data across all modalities to introduce a new condition. This paper proposes a solution to this problem based on denoising diffusion probabilistic models (DDPMs). Our motivation for choosing diffusion models over other generative models comes from the flexible internal structure of diffusion models. Since each sampling step in the DDPM follows a Gaussian distribution, we show that there exists a closed-form solution for generating an image given various constraints. Our method can unite multiple diffusion models trained on multiple sub-tasks and conquer the combined task through our proposed sampling strategy. We also introduce a novel reliability parameter that allows using different off-the-shelf diffusion models trained across various datasets during sampling time alone to guide it to the desired outcome satisfying multiple constraints. We perform experiments on various standard multimodal tasks to demonstrate the effectiveness of our approach. More details can be found in https://nithin-gk.github.io/projectpages/Multidiff/index.html
翻译:生成满足多个限制条件的照片在内容创作行业中有广泛的应用。完成这项任务的关键障碍是需要包含所有模式(即限制条件)和相应输出的配对数据。此外,现有方法需要使用所有模态之间的配对数据进行重新训练来引入新条件。本文基于去噪扩散概率模型(DDPM)提出了解决这个问题的解决方案。我们选择扩散模型而不是其他生成模型的动机来自于扩散模型的灵活内部结构。由于DDPM中每个采样步骤都遵循高斯分布,我们展示了存在一个闭合形式的解决方案,用于生成满足各种限制条件的图像。我们的方法可以将多个在多个子任务上进行训练的扩散模型合并,并通过我们提出的采样策略征服合并的任务。我们还引入了一个新颖的可靠性参数,它允许在采样期间使用针对不同数据集训练的不同现成扩散模型来引导采样器达到满足多个限制条件的期望结果。我们在各种标准的多模态任务上进行实验,以展示我们方法的有效性。更多详情可以在 https://nithin-gk.github.io/projectpages/Multidiff/index.html 中找到。