Image reconstruction and synthesis have witnessed remarkable progress thanks to the development of generative models. Nonetheless, gaps could still exist between the real and generated images, especially in the frequency domain. In this study, we show that narrowing gaps in the frequency domain can ameliorate image reconstruction and synthesis quality further. We propose a novel focal frequency loss, which allows a model to adaptively focus on frequency components that are hard to synthesize by down-weighting the easy ones. This objective function is complementary to existing spatial losses, offering great impedance against the loss of important frequency information due to the inherent bias of neural networks. We demonstrate the versatility and effectiveness of focal frequency loss to improve popular models, such as VAE, pix2pix, and SPADE, in both perceptual quality and quantitative performance. We further show its potential on StyleGAN2.
翻译:由于发展了基因模型,图像的重建和合成取得了显著的进展,然而,真实图像和生成图像之间可能仍然存在差距,特别是在频率领域。在本研究中,我们表明,缩小频率领域的差距可以进一步改善图像的重建和合成质量。我们提议采用一个新的焦点频率损失模式,使该模式能够适应性地侧重于难以通过轻量的简单模型合成的频率组成部分。这一客观功能是对现有的空间损失的补充,极大地阻碍了重要频率信息因神经网络的固有偏向而丢失。我们展示了中心频率损失的多功能性和有效性,以便在概念质量和数量两方面改进流行模型,如VAE、Pix2pix和SPADE。我们进一步展示了其在SysteleGAN2上的潜力。我们进一步展示了它在SyleGAN2上的潜力。