Toonify: 可控高分辨率纵向视频样式传输 (VToonify: Controllable High-Resolution Portrait Video Style Transfer)

from arxiv, ACM Transactions on Graphics (SIGGRAPH Asia 2022). Code: https://github.com/williamyang1991/VToonify Project page: https://www.mmlab-ntu.com/project/vtoonify/

Generating high-quality artistic portrait videos is an important and desirable task in computer graphics and vision. Although a series of successful portrait image toonification models built upon the powerful StyleGAN have been proposed, these image-oriented methods have obvious limitations when applied to videos, such as the fixed frame size, the requirement of face alignment, missing non-facial details and temporal inconsistency. In this work, we investigate the challenging controllable high-resolution portrait video style transfer by introducing a novel VToonify framework. Specifically, VToonify leverages the mid- and high-resolution layers of StyleGAN to render high-quality artistic portraits based on the multi-scale content features extracted by an encoder to better preserve the frame details. The resulting fully convolutional architecture accepts non-aligned faces in videos of variable size as input, contributing to complete face regions with natural motions in the output. Our framework is compatible with existing StyleGAN-based image toonification models to extend them to video toonification, and inherits appealing features of these models for flexible style control on color and intensity. This work presents two instantiations of VToonify built upon Toonify and DualStyleGAN for collection-based and exemplar-based portrait video style transfer, respectively. Extensive experimental results demonstrate the effectiveness of our proposed VToonify framework over existing methods in generating high-quality and temporally-coherent artistic portrait videos with flexible style controls.

翻译：制作高质量的艺术肖像视频是计算机图形和视觉中一项重要和可取的任务。虽然已经提出了一系列以强大的SysteleGAN为基础的成功肖像成像光化模型,但这些以图像为导向的方法在应用视频时显然有局限性,例如固定框架大小、面部对齐要求、缺少非面部细节和时间上的不一致。在这项工作中,我们通过引入一个新颖的 VToonify 框架,调查具有挑战性的可控高分辨率肖像像像样传输的高清晰度可控性高清晰度图像传输。具体来说,Voonization利用StyGAN的中高分辨率层,根据一个编码器所提取的多尺度内容特征制作高质量的艺术肖像,以便更好地保存框架内容。由此产生的完全进化结构接受不同尺寸的视频中不相容面面,作为投入,有助于完整地面对产出中的自然运动区域。我们的框架与基于StyGAN的StylegAN图像化模型兼容,将这些模型的吸引性特征用于对颜色和强度的灵活风格控制。这份工作展示了两个基于图象质的瞬间化的图像结构,并展示了我们图象化的高级图象化的高级图像格式的图像化方法,以图象化,以图像化方式制作了图制成了图制成的图制成了图制成的图制成图制成。