Generating photos satisfying multiple constraints find broad utility in the content creation industry. A key hurdle to accomplishing this task is the need for paired data consisting of all modalities (i.e., constraints) and their corresponding output. Moreover, existing methods need retraining using paired data across all modalities to introduce a new condition. This paper proposes a solution to this problem based on denoising diffusion probabilistic models (DDPMs). Our motivation for choosing diffusion models over other generative models comes from the flexible internal structure of diffusion models. Since each sampling step in the DDPM follows a Gaussian distribution, we show that there exists a closed-form solution for generating an image given various constraints. Our method can unite multiple diffusion models trained on multiple sub-tasks and conquer the combined task through our proposed sampling strategy. We also introduce a novel reliability parameter that allows using different off-the-shelf diffusion models trained across various datasets during sampling time alone to guide it to the desired outcome satisfying multiple constraints. We perform experiments on various standard multimodal tasks to demonstrate the effectiveness of our approach. More details can be found in https://nithin-gk.github.io/projectpages/Multidiff/index.html
翻译:在内容创建行业中,满足多种限制的生成照片具有广泛的用途。完成这项任务的一个关键障碍是需要由所有模式(即制约)及其相应输出组成的配对数据。此外,现有方法需要在所有模式中使用配对数据进行再培训,以引入新的条件。本文件根据分流扩散概率模型(DDPMs)提出解决这一问题的办法。我们选择扩散模型比其他基因模型更灵活的传播模型的动机来自传播模型的灵活内部结构。由于DDPM的每个取样步骤都遵循高山分布,我们显示在各种制约下存在生成图像的封闭式解决方案。我们的方法可以整合多套子子任务培训的多种传播模型,并通过我们提议的采样战略来控制合并任务。我们还引入了一个新的可靠性参数,允许使用在采样期间通过不同数据集培训的不同现成的传播模型来指导它实现预期结果的多重制约。我们进行了各种标准多式联运任务实验,以展示我们的方法的有效性。更多细节可以在 https://nithin-glifliplidal/projustimus.