Controllable person image synthesis task enables a wide range of applications through explicit control over body pose and appearance. In this paper, we propose a cross attention based style distribution module that computes between the source semantic styles and target pose for pose transfer. The module intentionally selects the style represented by each semantic and distributes them according to the target pose. The attention matrix in cross attention expresses the dynamic similarities between the target pose and the source styles for all semantics. Therefore, it can be utilized to route the color and texture from the source image, and is further constrained by the target parsing map to achieve a clearer objective. At the same time, to encode the source appearance accurately, the self attention among different semantic styles is also added. The effectiveness of our model is validated quantitatively and qualitatively on pose transfer and virtual try-on tasks.
翻译:可控人图像合成任务通过对身体外观和外观的清晰控制,可以实现范围广泛的应用。 在本文中, 我们提出一个基于交叉关注的样式分布模块, 该模块在源语义样式和目标之间进行计算, 以显示外观。 该模块有意选择每个语义样式, 并按照目标外观进行分布。 交叉关注矩阵显示了目标外观和所有语义的源样式之间的动态相似性。 因此, 它可用于从源图像中选择颜色和纹理, 并受到目标解析地图的进一步限制, 以达到更明确的目标。 同时, 要准确编码源的外观, 还要添加不同语义样式之间的自我关注。 我们模型的有效性在配置传输和虚拟试写任务上得到了定量和定性的验证。