Virtual try-on aims to generate a photo-realistic fitting result given an in-shop garment and a reference person image. Existing methods usually build up multi-stage frameworks to deal with clothes warping and body blending respectively, or rely heavily on intermediate parser-based labels which may be noisy or even inaccurate. To solve the above challenges, we propose a single-stage try-on framework by developing a novel Deformable Attention Flow (DAFlow), which applies the deformable attention scheme to multi-flow estimation. With pose keypoints as the guidance only, the self- and cross-deformable attention flows are estimated for the reference person and the garment images, respectively. By sampling multiple flow fields, the feature-level and pixel-level information from different semantic areas are simultaneously extracted and merged through the attention mechanism. It enables clothes warping and body synthesizing at the same time which leads to photo-realistic results in an end-to-end manner. Extensive experiments on two try-on datasets demonstrate that our proposed method achieves state-of-the-art performance both qualitatively and quantitatively. Furthermore, additional experiments on the other two image editing tasks illustrate the versatility of our method for multi-view synthesis and image animation.
翻译:虚拟试运行的目的是产生一个符合照片现实的匹配结果, 给商店服装和参考人图像提供。 现有方法通常建立多阶段框架, 分别处理衣服扭曲和身体混合, 或严重依赖可能吵闹甚至不准确的中间分析器标签。 为了解决上述挑战, 我们提出一个单一阶段试运行框架, 开发一个新颖的变形注意流( DAFlow), 将可变式注意计划应用于多流量估计。 将关键点作为唯一的指导, 对参考人和服装图像分别进行自我和交叉变形注意的流量估计。 通过抽样多个流域, 从不同语义区提取的地平级和像素级信息通过关注机制同时提取和合并。 它使衣服扭曲和身体同步能够同时以最终到终端的方式取得摄影现实效果。 在两个试运行的数据集上进行广泛的实验, 显示我们拟议的方法在质量和数量方面和数量方面都达到了状态。 另外, 演示了其它两个图像的多面合成方法。