Existing GAN inversion methods are stuck in a paradox that the inverted codes can either achieve high-fidelity reconstruction, or retain the editing capability. Having only one of them clearly cannot realize real image editing. In this paper, we resolve this paradox by introducing consecutive images (\eg, video frames or the same person with different poses) into the inversion process. The rationale behind our solution is that the continuity of consecutive images leads to inherent editable directions. This inborn property is used for two unique purposes: 1) regularizing the joint inversion process, such that each of the inverted code is semantically accessible from one of the other and fastened in a editable domain; 2) enforcing inter-image coherence, such that the fidelity of each inverted code can be maximized with the complement of other images. Extensive experiments demonstrate that our alternative significantly outperforms state-of-the-art methods in terms of reconstruction fidelity and editability on both the real image dataset and synthesis dataset. Furthermore, our method provides the first support of video-based GAN inversion, and an interesting application of unsupervised semantic transfer from consecutive images. Source code can be found at: \url{https://github.com/cnnlstm/InvertingGANs_with_ConsecutiveImgs}.
翻译:现有的 GAN 反向方法被困在一个悖论之中, 反向代码既可以实现高不忠重建, 也可以保留编辑能力。 只有其中之一显然无法实现真实的图像编辑。 在本文中, 我们通过将连续图像(\ eg, 视频框架或具有不同形状的同一个人)引入反向进程来解决这一悖论。 我们解决方案背后的理由是连续图像的连续性导致内在的编辑方向。 这个诞生的属性用于两个独特的目的:(1) 将联合反向进程正规化, 以便每个反向代码都可以从另一个代码中获取音义性, 并在可编辑的域中加固;(2) 执行图像间的一致性, 使每个反向代码的忠实性能能够与其他图像的互补性最大化。 广泛的实验表明, 在重建真实图像集和合成数据集上, 我们的替代方法大大偏离了状态的准确性和可编辑性。 此外, 我们的方法提供了基于视频的 GAN 的首个支持, 并在可编辑域中加固化的域中加固化; 将SAN / semplainal dable dable dext: supalvicultalbles/ dlass.