Human pose transfer aims to synthesize a new view of a person under a given pose. Recent works achieve this via self-reconstruction, which disentangles pose and texture features from the person image, then combines the two features to reconstruct the person. Such feature-level disentanglement is a difficult and ill-defined problem that could lead to loss of details and unwanted artifacts. In this paper, we propose a self-driven human pose transfer method that permutes the textures at random, then reconstructs the image with a dual branch attention to achieve image-level disentanglement and detail-preserving texture transfer. We find that compared with feature-level disentanglement, image-level disentanglement is more controllable and reliable. Furthermore, we introduce a dual kernel encoder that gives different sizes of receptive fields in order to reduce the noise caused by permutation and thus recover clothing details while aligning pose and textures. Extensive experiments on DeepFashion and Market-1501 shows that our model improves the quality of generated images in terms of FID, LPIPS and SSIM over other self-driven methods, and even outperforming some fully-supervised methods. A user study also shows that among self-driven approaches, images generated by our method are preferred in 72% of cases over prior work.
翻译:人类姿势转换的目的是合成一个特定姿势下的人的新观点。 近期的作品通过自我重建( 自我重建) 实现这一点, 它分解了与人图像的形状和纹理特征, 然后将两种特征结合起来来重建人。 这种特征层次的分解是一个困难和定义不清的问题, 可能导致细节和不需要的人工制品丢失。 在本文中, 我们提出一种自我驱动的人体姿势转移方法, 随机地将质质素混杂在一起, 然后用一个双重分支来重建图像, 以达到图像层次的分解和细节保存纹理传输。 我们发现, 与特征层次的分解特征分解相比, 图像层次的分解更容易控制和可靠。 此外, 我们引入了一种双重的内核编码器, 以不同大小的可接受域, 以降低调和取回的衣物细节, 同时调容容容物和纹理。 在Deep FID、 LIPIPS和 1501 和 SSIM 文本传输方面进行的广泛实验表明, 我们的模型改进了某些图像生成的品质质量, 用FID、 LIPPS 和 SSIM 也通过其他自我驱动的系统化方法, 超越了其他自我驱动的方法, 。