Image synthesis under multi-modal priors is a useful and challenging task that has received increasing attention in recent years. A major challenge in using generative models to accomplish this task is the lack of paired data containing all modalities (i.e. priors) and corresponding outputs. In recent work, a variational auto-encoder (VAE) model was trained in a weakly supervised manner to address this challenge. Since the generative power of VAEs is usually limited, it is difficult for this method to synthesize images belonging to complex distributions. To this end, we propose a solution based on a denoising diffusion probabilistic models to synthesise images under multi-model priors. Based on the fact that the distribution over each time step in the diffusion model is Gaussian, in this work we show that there exists a closed-form expression to the generate the image corresponds to the given modalities. The proposed solution does not require explicit retraining for all modalities and can leverage the outputs of individual modalities to generate realistic images according to different constraints. We conduct studies on two real-world datasets to demonstrate the effectiveness of our approach
翻译:近年来,在多模式前期进行图像合成是一项有用和具有挑战性的任务,近年来这一任务日益受到越来越多的关注。在使用基因模型来完成这项任务方面,一个重大挑战是缺乏包含所有模式(即前期)和相应产出的配对数据。在最近的工作中,对一个变式自动编码器(VAE)模型进行了薄弱监督的培训,以迎接这一挑战。由于VAEs的基因变异能力通常有限,因此这种方法很难合成属于复杂分布的图像。为此,我们提出一个解决办法,其基础是在多模式前期合成图像的分辨扩散概率模型。基于以下事实,即扩散模型每个阶段的分布情况是高山,我们在此工作中表明,生成图像的封闭式表达方式与给定模式相对应。拟议的解决方案并不要求对所有模式进行明确的再培训,而且能够利用单个模式的产出产生符合不同制约的现实图像。我们对两个真实世界数据集进行了研究,以证明我们的方法的有效性。