In this paper, we propose the Self-Attention Generative Adversarial Network (SAGAN) which allows attention-driven, long-range dependency modeling for image generation tasks. Traditional convolutional GANs generate high-resolution details as a function of only spatially local points in lower-resolution feature maps. In SAGAN, details can be generated using cues from all feature locations. Moreover, the discriminator can check that highly detailed features in distant portions of the image are consistent with each other. Furthermore, recent work has shown that generator conditioning affects GAN performance. Leveraging this insight, we apply spectral normalization to the GAN generator and find that this improves training dynamics. The proposed SAGAN achieves the state-of-the-art results, boosting the best published Inception score from 36.8 to 52.52 and reducing Frechet Inception distance from 27.62 to 18.65 on the challenging ImageNet dataset. Visualization of the attention layers shows that the generator leverages neighborhoods that correspond to object shapes rather than local regions of fixed shape.
翻译:在本文中,我们建议建立自我注意基因反转网络(SAGAN),为图像生成任务提供关注驱动的远程依赖模型。传统的革命性GANs生成高分辨率细节,仅作为低分辨率地貌地图中空间局部点的函数。在SAGAN中,所有特征位置的信号都可以生成细节。此外,歧视者可以检查远方图像中非常详细的特点是否相互一致。此外,最近的工作表明,发电机调节会影响GAN的性能。利用这一洞察力,我们将光谱正常化应用于GAN生成器,发现这可以改善培训动态。拟议的SAGAN实现了最新的结果,将最佳的Inpeption评分从36.8提高到52.52,并将挑战性图像网络数据集上的Frechet Invition距离从27.62降到18.65。关注层的可视化显示,发电机利用符合物体形状而不是固定形状的局部区域的邻居。