Human pose transfer has received great attention due to its wide applications, yet is still a challenging task that is not well solved. Recent works have achieved great success to transfer the person image from the source to the target pose. However, most of them cannot well capture the semantic appearance, resulting in inconsistent and less realistic textures on the reconstructed results. To address this issue, we propose a new two-stage framework to handle the pose and appearance translation. In the first stage, we predict the target semantic parsing maps to eliminate the difficulties of pose transfer and further benefit the latter translation of per-region appearance style. In the second one, with the predicted target semantic maps, we suggest a new person image generation method by incorporating the region-adaptive normalization, in which it takes the per-region styles to guide the target appearance generation. Extensive experiments show that our proposed SPGNet can generate more semantic, consistent, and photo-realistic results and perform favorably against the state of the art methods in terms of quantitative and qualitative evaluation. The source code and model are available at https://github.com/cszy98/SPGNet.git.
翻译:人类姿势的转移因其应用范围广泛而得到极大关注,但目前仍是一项挑战性的任务,尚未很好地解决。最近的工作在将人的形象从源头转移到目标面部方面取得了巨大成功。然而,大多数工作无法很好地捕捉语义外观,导致重建结果的文字结构不一致和不那么现实。为解决这一问题,我们提议一个新的两阶段框架来处理人姿和外观翻译。在第一阶段,我们预测目标语义解解图,以消除人姿势转移的困难,并进一步使后一种区域外观的翻译受益。在第二个阶段,我们用预测的目标语义图,建议采用新的人形象生成方法,纳入区域适应性正常化,采用每个区域的风格来指导目标外观生成。广泛的实验表明,我们提议的SPGNet能够产生更多的语义性、一致性和摄影现实性结果,并在定量和定性评价方面与艺术方法的状态相对优异。源代码和模型见https://github.com/cszy98/SPGNet。