Generative Adversarial Networks (GANs) produce impressive results on unconditional image generation when powered with large-scale image datasets. Yet generated images are still easy to spot especially on datasets with high variance (e.g. bedroom, church). In this paper, we propose various improvements to further push the boundaries in image generation. Specifically, we propose a novel dual contrastive loss and show that, with this loss, discriminator learns more generalized and distinguishable representations to incentivize generation. In addition, we revisit attention and extensively experiment with different attention blocks in the generator. We find attention to be still an important module for successful image generation even though it was not used in the recent state-of-the-art models. Lastly, we study different attention architectures in the discriminator, and propose a reference attention mechanism. By combining the strengths of these remedies, we improve the compelling state-of-the-art Fr\'{e}chet Inception Distance (FID) by at least 17.5% on several benchmark datasets. We obtain even more significant improvements on compositional synthetic scenes (up to 47.5% in FID).
翻译:以大型图像数据集发电时,生成的图像生成在无条件图像生成方面产生了令人印象深刻的结果。然而,生成的图像仍然很容易发现,特别是在差异很大的数据集(如卧室、教堂)上。在本文中,我们提出了各种改进建议,以进一步拉动图像生成的界限。具体地说,我们提出了新的双重对比性损失,并表明,随着这种损失,歧视者学会了激励生成的更普遍和可区别的表现形式。此外,我们再次关注并广泛试验了生成器中的不同关注区块。我们发现,即使在最近的最先进的模型中没有使用这些图像生成成功生成的模块,但人们仍然发现这种关注区仍然是一个重要的模块。最后,我们研究了歧视者中的不同关注区结构,并提出了一个参考关注机制。我们将这些补救措施的优点结合起来,使最有说服力的尖端Fr\}} 切切除 Invition距离(FID) 提高若干基准数据集至少17.5%。我们在合成图像合成场上取得了更显著的改进(在FIRFID中达到47.5% )。