Controllable person image generation aims to produce realistic human images with desirable attributes (e.g., the given pose, cloth textures or hair style). However, the large spatial misalignment between the source and target images makes the standard architectures for image-to-image translation not suitable for this task. Most of the state-of-the-art architectures avoid the alignment step during the generation, which causes many artifacts, especially for person images with complex textures. To solve this problem, we introduce a novel Spatially-Adaptive Warped Normalization (SAWN), which integrates a learned flow-field to warp modulation parameters. This allows us to align person spatial-adaptive styles with pose features efficiently. Moreover, we propose a novel self-training part replacement strategy to refine the pretrained model for the texture-transfer task, significantly improving the quality of the generated cloth and the preservation ability of irrelevant regions. Our experimental results on the widely used DeepFashion dataset demonstrate a significant improvement of the proposed method over the state-of-the-art methods on both pose-transfer and texture-transfer tasks.
翻译:受控人图像生成的目的是产生具有适当属性(例如,给定的姿势、布质或发型)的符合现实的人类图像。然而,源和目标图像之间的巨大空间偏差使得图像到图像翻译的标准结构不适合这项任务。大多数最先进的建筑避免了生成过程中的校正步骤,这导致许多文物的校正步骤,特别是具有复杂质素的人图像的校正。为了解决这个问题,我们引入了一个新的空间-发育扭曲正常化(SAWN),将一个学习的流场与调制参数结合起来,从而使我们能够将个人空间适应风格与外观特征有效地协调起来。此外,我们提出了一个新的自我培训部分替换战略,以完善素材转移任务的预先训练模型,显著提高生成的布料质量和不相关区域的保存能力。我们在广泛使用的 " 深时装 " 数据集上的实验结果表明,拟议方法大大改进了在制容转让和制质素转让任务方面采用的最新方法。