Image-to-image translation has been revolutionized with GAN-based methods. However, existing methods lack the ability to preserve the identity of the source domain. As a result, synthesized images can often over-adapt to the reference domain, losing important structural characteristics and suffering from suboptimal visual quality. To solve these challenges, we propose a novel frequency domain image translation (FDIT) framework, exploiting frequency information for enhancing the image generation process. Our key idea is to decompose the image into low-frequency and high-frequency components, where the high-frequency feature captures object structure akin to the identity. Our training objective facilitates the preservation of frequency information in both pixel space and Fourier spectral space. We broadly evaluate FDIT across five large-scale datasets and multiple tasks including image translation and GAN inversion. Extensive experiments and ablations show that FDIT effectively preserves the identity of the source image, and produces photo-realistic images. FDIT establishes state-of-the-art performance, reducing the average FID score by 5.6% compared to the previous best method.
翻译:以GAN为基础的图像到图像翻译方法已经革命了。然而,现有的方法缺乏保存源域特性的能力。因此,合成图像往往会过度适应参考域,失去重要的结构特征,并受到不理想的视觉质量的影响。为了应对这些挑战,我们提议了一个新型频域图像翻译框架,利用频率信息来增强图像生成过程。我们的关键想法是将图像分解成低频和高频组件,高频特征捕获对象的结构与身份相似。我们的培训目标有利于保存像素空间和Fourier光谱空间的频率信息。我们广泛评价FDIT在五个大型数据集和多个任务中的频率信息,包括图像翻译和GAN的转换。广泛的实验和推理表明FDIT有效地维护了源图像的特性,并制作了摄影现实图像。FDIT建立了最先进的性能,比以往的最佳方法减少了5.6 % 。