Semantic image synthesis, translating semantic layouts to photo-realistic images, is a one-to-many mapping problem. Though impressive progress has been recently made, diverse semantic synthesis that can efficiently produce semantic-level multimodal results, still remains a challenge. In this paper, we propose a novel diverse semantic image synthesis framework from the perspective of semantic class distributions, which naturally supports diverse generation at semantic or even instance level. We achieve this by modeling class-level conditional modulation parameters as continuous probability distributions instead of discrete values, and sampling per-instance modulation parameters through instance-adaptive stochastic sampling that is consistent across the network. Moreover, we propose prior noise remapping, through linear perturbation parameters encoded from paired references, to facilitate supervised training and exemplar-based instance style control at test time. Extensive experiments on multiple datasets show that our method can achieve superior diversity and comparable quality compared to state-of-the-art methods. Code will be available at \url{https://github.com/tzt101/INADE.git}
翻译:语义图像合成,将语义布局转换成摄影现实图像,是一个一到多种绘图问题。虽然最近取得了令人印象深刻的进展,但各种语义合成,能够有效地产生语义层面的多式联运结果,仍然是一项挑战。在本文件中,我们提议从语义类分布的角度来一个新的多种语义图像合成框架,它自然地支持在语义或甚至实例层面的不同生成。我们通过模拟等级级有条件调制参数,作为连续概率分布,而不是离散值,以及通过网络之间一致的试样-适应性随机采样抽样,对每因子调制参数进行取样。此外,我们提议通过从对齐引用编码的线性扰动参数,事先进行噪声再映射,以便利在测试时有监督的培训和基于Exmplar的审案风格控制。对多个数据集进行广泛的实验表明,我们的方法能够实现与州-艺术方法相比的更高多样性和可比质量。代码将在urlhttp://githubus. 101/tzzonat 方法上提供。