Deep generative models have made great progress in synthesizing images with arbitrary human poses and transferring poses of one person to others. However, most existing approaches explicitly leverage the pose information extracted from the source images as a conditional input for the generative networks. Meanwhile, they usually focus on the visual fidelity of the synthesized images but neglect the inherent consistency, which further confines their performance of pose transfer. To alleviate the current limitations and improve the quality of the synthesized images, we propose a pose transfer network with Disentangled Feature Consistency (DFC-Net) to facilitate human pose transfer. Given a pair of images containing the source and target person, DFC-Net extracts pose and static information from the source and target respectively, then synthesizes an image of the target person with the desired pose from the source. Moreover, DFC-Net leverages disentangled feature consistency losses in the adversarial training to strengthen the transfer coherence and integrates the keypoint amplifier to enhance the pose feature extraction. Additionally, an unpaired support dataset Mixamo-Sup providing more extra pose information has been further utilized during the training to improve the generality and robustness of DFC-Net. Extensive experimental results on Mixamo-Pose and EDN-10k have demonstrated DFC-Net achieves state-of-the-art performance on pose transfer.
翻译:深度基因模型在将图像与任意人造相和将一个人的外形相配为一体方面取得了巨大进展;然而,大多数现有方法都明确利用来源图像中提取的外形信息作为基因网络的有条件输入;同时,它们通常侧重于合成图像的视觉真实性,忽视内在一致性,这进一步限制了其外形传输的性能;为了减轻目前的局限性,提高合成图像的质量,我们提议建立一个具有分错的地貌一致性(DFC-Net)的外形传输网络,以便利人造相转移;鉴于含有源和目标人的一副图像,DFC-Net提取的外形信息以及来源和目标的静态信息,它们通常侧重于合成图像的视觉真实性,而忽视固有的一致性,从而进一步限制了其外形传输的性能;此外,DFC-Net的杠杆在加强传输一致性和整合关键点放大器以加强外形特征提取(DC-SUP-Sup提供更多外形信息的外形支持数据集。