StyleGAN is arguably one of the most intriguing and well-studied generative models, demonstrating impressive performance in image generation, inversion, and manipulation. In this work, we explore the recent StyleGAN3 architecture, compare it to its predecessor, and investigate its unique advantages, as well as drawbacks. In particular, we demonstrate that while StyleGAN3 can be trained on unaligned data, one can still use aligned data for training, without hindering the ability to generate unaligned imagery. Next, our analysis of the disentanglement of the different latent spaces of StyleGAN3 indicates that the commonly used W/W+ spaces are more entangled than their StyleGAN2 counterparts, underscoring the benefits of using the StyleSpace for fine-grained editing. Considering image inversion, we observe that existing encoder-based techniques struggle when trained on unaligned data. We therefore propose an encoding scheme trained solely on aligned data, yet can still invert unaligned images. Finally, we introduce a novel video inversion and editing workflow that leverages the capabilities of a fine-tuned StyleGAN3 generator to reduce texture sticking and expand the field of view of the edited video.
翻译:StyleGAN3 可以说是最令人感兴趣和研究周密的基因模型之一, 显示了图像生成、 翻版和操控方面令人印象深刻的性能。 在这项工作中, 我们探索了最新的StyleGAN3 结构, 将其与前身比较, 并调查其独特的优势和缺点。 特别是, 我们证明StyleGAN3 可以在不匹配的数据上接受培训, 但仍可以使用一致的数据进行培训, 同时又不阻碍生成不匹配图像的能力。 其次, 我们对StyleGAN3 不同潜在空间的分解分析显示, 常用的W/W+空间比其StyleGAN2 对应空间更纠缠在一起, 强调了使用StyleSpace 进行精密的编辑的好处。 考虑到图像转换, 我们观察到现有的基于编码器的技术在对不匹配数据进行培训时会挣扎。 因此我们提议一个仅对不匹配数据进行训练的编码计划, 但仍然可以反调图像。 最后, 我们引入了一个新的视频转换和编辑工作流程, 利用微调的StyleGAN3 生成器的能力来减少图像的粘滞。