Facial image manipulation has achieved great progress in recent years. However, previous methods either operate on a predefined set of face attributes or leave users little freedom to interactively manipulate images. To overcome these drawbacks, we propose a novel framework termed MaskGAN, enabling diverse and interactive face manipulation. Our key insight is that semantic masks serve as a suitable intermediate representation for flexible face manipulation with fidelity preservation. MaskGAN has two main components: 1) Dense Mapping Network (DMN) and 2) Editing Behavior Simulated Training (EBST). Specifically, DMN learns style mapping between a free-form user modified mask and a target image, enabling diverse generation results. EBST models the user editing behavior on the source mask, making the overall framework more robust to various manipulated inputs. Specifically, it introduces dual-editing consistency as the auxiliary supervision signal. To facilitate extensive studies, we construct a large-scale high-resolution face dataset with fine-grained mask annotations named CelebAMask-HQ. MaskGAN is comprehensively evaluated on two challenging tasks: attribute transfer and style copy, demonstrating superior performance over other state-of-the-art methods. The code, models, and dataset are available at https://github.com/switchablenorms/CelebAMask-HQ.
翻译:近年来,对面部图像的操纵取得了巨大进展。然而,以往的方法要么是在一套预先定义的面部属性上操作,要么让用户很少自由互动操作图像。为了克服这些缺陷,我们提议了一个名为MaskGAN的新框架,使面部操作多样化和互动。我们的关键见解是,语义面罩是使用忠实保护灵活面部操纵的适当中间代表。MaskGAN有两个主要组成部分:1) Dense映射网络(DMN)和2)编辑行为模拟培训(EBST)。具体地说,DMN学会在自由格式用户修改的遮罩和目标图像之间进行风格制图,以促成不同的生成结果。EBST在源面罩上模拟用户编辑行为,使各种被操纵的投入更加强大。具体地说,它引入了双重编辑一致性作为辅助性监督信号。为了便利广泛的研究,我们建立了一个大型高分辨率面部数据集,并配有精细的面具说明,名为CelibAMAsk-HQQQ。MaskGGAN正在对两项挑战性的任务进行全面评估:属性转移和样式复制,展示高级性复制,展示高级性业绩超过其他州-Qamst-mask-commats。