Semantic image synthesis (SIS) aims to produce photorealistic images aligning to given conditional semantic layout and has witnessed a significant improvement in recent years. Although the diversity in image-level has been discussed heavily, class-level mode collapse widely exists in current algorithms. Therefore, we declare a new requirement for SIS to achieve more photorealistic images, variation-aware, which consists of inter- and intra-class variation. The inter-class variation is the diversity between different semantic classes while the intra-class variation stresses the diversity inside one class. Through analysis, we find that current algorithms elusively embrace the inter-class variation but the intra-class variation is still not enough. Further, we introduce two simple methods to achieve variation-aware semantic image synthesis (VASIS) with a higher intra-class variation, semantic noise and position code. We combine our method with several state-of-the-art algorithms and the experimental result shows that our models generate more natural images and achieves slightly better FIDs and/or mIoUs than the counterparts. Our codes and models will be publicly available.
翻译:语义图像合成(SIS)旨在产生符合有条件语义布局的光现实图像,近年来取得了显著的改善。虽然对图像层面的多样性进行了大量讨论,但目前的算法中存在着等级模式的崩溃现象。因此,我们宣布,对于SIS来说,新要求SIS实现更具有光现实性的图像、变异认知(由阶级间和内部差异组成),这是一个新的要求。阶级间差异是不同语义类别之间的多样性,而阶级内部差异则强调一个阶级内部的多样性。通过分析,我们发现目前的算法难以包含不同阶层之间的差异,但类内差异仍然不够。此外,我们引入两种简单的方法,以实现变异性语言图像合成(VASIS),而更具有更高等级内部差异、语义噪音和位置代码。我们把我们的方法与若干最先进的算法和实验结果结合起来,表明我们的模型产生更多的自然图像,并比对应方略获得更好的FID和/或MIOU。我们的代码和模型将公开使用。