Cross-modality data translation has attracted great interest in image computing. Deep generative models (\textit{e.g.}, GANs) show performance improvement in tackling those problems. Nevertheless, as a fundamental challenge in image translation, the problem of Zero-shot-Learning Cross-Modality Data Translation with fidelity remains unanswered. This paper proposes a new unsupervised zero-shot-learning method named Mutual Information guided Diffusion cross-modality data translation Model (MIDiffusion), which learns to translate the unseen source data to the target domain. The MIDiffusion leverages a score-matching-based generative model, which learns the prior knowledge in the target domain. We propose a differentiable local-wise-MI-Layer ($LMI$) for conditioning the iterative denoising sampling. The $LMI$ captures the identical cross-modality features in the statistical domain for the diffusion guidance; thus, our method does not require retraining when the source domain is changed, as it does not rely on any direct mapping between the source and target domains. This advantage is critical for applying cross-modality data translation methods in practice, as a reasonable amount of source domain dataset is not always available for supervised training. We empirically show the advanced performance of MIDiffusion in comparison with an influential group of generative models, including adversarial-based and other score-matching-based models.
翻译:跨模式数据翻译吸引了人们对图像计算的巨大兴趣。 深基因模型(\ textit{ e. g. }, GANs) 显示解决这些问题的绩效改进。 然而,作为图像翻译方面的一个基本挑战,零光- 学习跨模式数据翻译问题仍未得到解决。 本文建议采用一个新的未经监督的零点学习方法,名为“ 互通信息引导的传播跨模式数据翻译模型 ” (MIDifulation),该方法学会将未知源数据翻译到目标域。 MIMID利用了一个基于分比匹配的基因化模型,该模型在目标域学习了先前的知识。 我们提出一个不同的本地- 零点- 学习跨模式问题,用于调节迭代性脱色抽样。 $LMI在基于传播的指导的统计域中收集相同的交叉模式; 因此,在源域发生变化时,我们的方法不需要再培训,因为它并不依赖源和目标域之间的任何直接绘图。 我们的优势始终是应用跨类型数据转换的系统化工具, 包括用于其他源化培训的高级工具。