Recent GAN inversion models focus on preserving image-specific details through various methods, e.g., generator tuning or feature mixing. While those are helpful for preserving details compared to a naiive low-rate latent inversion, they still fail to maintain high-frequency features precisely. In this paper, we point out that the existing GAN inversion models have inherent limitations in both structural and training aspects, which preclude the delicate reconstruction of high-frequency features. Especially, we prove that the widely-used loss term in GAN inversion, i.e., L2, is biased to reconstruct low-frequency features mainly. To overcome this problem, we propose a novel GAN inversion model, coined WaGI, which enables to handle high-frequency features explicitly, by using a novel wavelet-based loss term and a newly proposed wavelet fusion scheme. To the best of our knowledge, WaGI is the first attempt to interpret GAN inversion in the frequency domain. We demonstrate that WaGI shows outstanding results on both inversion and editing, compared to the existing state-of-the-art GAN inversion models. Especially, WaGI robustly preserves high-frequency features of images even in the editing scenario. We will release our code with the pre-trained model after the review.
翻译:最近的GAN 变换模型侧重于通过各种方法保存图像特有细节,例如发电机调制或特性混合等,这些模型有助于保存细节,而与隐性低率潜伏反转相比,这些模型有助于保存细节,但它们仍然未能精确地保持高频特征。在本文件中,我们指出现有的GAN 反转模型在结构和培训两方面都有内在的局限性,这妨碍了高频特性的微妙重建。特别是,我们证明GAN 反转(即L2)中广泛使用的“损失”一词主要偏向于重建低频特性。为了克服这一问题,我们提出了一个新的GAN 变换模型,即WAGI,它能够明确处理高频特性,使用新的波状损失术语和新提议的波状聚合计划。据我们所知,WAGI是首次试图解释频率域内GAN 变换的微妙性。我们证明WAGIG在现有的GAN 变换式模型中,与现有的状态GAN 变换模式相比,WAGGI会展示出杰出的反向和编辑结果。特别是WAGIGI将保留前高频图像的预版本。