Image stylization aims at applying a reference style to arbitrary input images. A common scenario is one-shot stylization, where only one example is available for each reference style. Recent approaches for one-shot stylization such as JoJoGAN fine-tune a pre-trained StyleGAN2 generator on a single style reference image. However, such methods cannot generate multiple stylizations without fine-tuning a new model for each style separately. In this work, we present a MultiStyleGAN method that is capable of producing multiple different stylizations at once by fine-tuning a single generator. The key component of our method is a learnable transformation module called Style Transformation Network. It takes latent codes as input, and learns linear mappings to different regions of the latent space to produce distinct codes for each style, resulting in a multistyle space. Our model inherently mitigates overfitting since it is trained on multiple styles, hence improving the quality of stylizations. Our method can learn upwards of $120$ image stylizations at once, bringing $8\times$ to $60\times$ improvement in training time over recent competing methods. We support our results through user studies and quantitative results that indicate meaningful improvements over existing methods.
翻译:图像风格化旨在将参考风格应用于任意输入图像。常见的情况是一次性风格化 (one-shot stylization),其中每个参考风格只有一个样本。最近的一次风格化方法,如 JoJoGAN,通过在单个风格参考图像上微调预训练的 StyleGAN2 生成器来实现。然而,这种方法无法在不为每种风格单独微调新模型的情况下生成多个风格化版本。在本文中,我们提出了一种 MultiStyleGAN 方法,通过微调单个生成器,能够同时生成多个不同的风格化版本。我们方法的关键组成部分是一个可学习的变换模块,称为风格变换网络。它以潜在代码为输入,并学习将不同区域的线性映射到潜在空间中不同的代码以产生每种风格的不同代码,从而形成一个多风格空间。我们的模型本质上缓解了过拟合问题,因为它在多种风格上进行训练,从而提高了风格化的质量。我们的方法能够一次学习多达 120 种图像风格,相较于最近的竞争方法,训练时间提高了 8 倍到 60 倍不等。我们通过用户研究和定量结果支持我们的结果,这些结果表明相较于现有方法,我们能够获得有意义的改进。