Person image generation aims to perform non-rigid deformation on source images, which generally requires unaligned data pairs for training. Recently, self-supervised methods express great prospects in this task by merging the disentangled representations for self-reconstruction. However, such methods fail to exploit the spatial correlation between the disentangled features. In this paper, we propose a Self-supervised Correlation Mining Network (SCM-Net) to rearrange the source images in the feature space, in which two collaborative modules are integrated, Decomposed Style Encoder (DSE) and Correlation Mining Module (CMM). Specifically, the DSE first creates unaligned pairs at the feature level. Then, the CMM establishes the spatial correlation field for feature rearrangement. Eventually, a translation module transforms the rearranged features to realistic results. Meanwhile, for improving the fidelity of cross-scale pose transformation, we propose a graph based Body Structure Retaining Loss (BSR Loss) to preserve reasonable body structures on half body to full body generation. Extensive experiments conducted on DeepFashion dataset demonstrate the superiority of our method compared with other supervised and unsupervised approaches. Furthermore, satisfactory results on face generation show the versatility of our method in other deformation tasks.
翻译:个人图像生成的目的是在源图像上进行非硬化变形,这通常要求为培训提供不兼容的数据配对。最近,自我监督的方法通过将分解的自我重建演示体合并,显示了这项任务的巨大前景。然而,这些方法未能利用分解特性之间的空间相关性。在本文件中,我们提议建立一个自我监督的互换采矿网络(SCM-Net),以重新排列地貌空间中的源图像,其中两个合作模块是整合的,分解的样式编码(DSE)和互换采矿模块(CMM)。具体地说,DSE首先在地貌层面创建不匹配的配对。然后,CMM建立了地貌重新布局的空间相关性字段。最终,一个翻译模块将重新布局的特性转化为现实的结果。与此同时,为了提高跨规模的变形变形的真伪性,我们提出了一个基于图表的体结构保持损失(BSR损失),以维护半体至完整的体代的合理体结构结构。具体地,在地平面层一级首次进行广泛的实验。然后,CMM为特征调整后,将显示我们其他方法的满意性变形方法的等级。