In this paper, we target image-based person-to-person virtual try-on in the presence of diverse poses and large viewpoint variations. Existing methods are restricted in this setting as they estimate garment warping flows mainly based on 2D poses and appearance, which omits the geometric prior of the 3D human body shape. Moreover, current garment warping methods are confined to localized regions, which makes them ineffective in capturing long-range dependencies and results in inferior flows with artifacts. To tackle these issues, we present 3D-aware global correspondences, which are reliable flows that jointly encode global semantic correlations, local deformations, and geometric priors of 3D human bodies. Particularly, given an image pair depicting the source and target person, (a) we first obtain their pose-aware and high-level representations via two encoders, and introduce a coarse-to-fine decoder with multiple refinement modules to predict the pixel-wise global correspondence. (b) 3D parametric human models inferred from images are incorporated as priors to regularize the correspondence refinement process so that our flows can be 3D-aware and better handle variations of pose and viewpoint. (c) Finally, an adversarial generator takes the garment warped by the 3D-aware flow, and the image of the target person as inputs, to synthesize the photo-realistic try-on result. Extensive experiments on public benchmarks and our HardPose test set demonstrate the superiority of our method against the SOTA try-on approaches.
翻译:在本文中,我们把基于图像的人对人的虚拟试镜作为目标,面对不同的外形和巨大的观点差异。在这种背景下,现有方法受到限制,因为它们根据2D的外形和外观估计服装扭曲流程,其中省略了3D人体形状的几何前方。此外,目前的外衣扭曲方法局限于局部区域,因此它们无法有效捕捉长距离依赖性,并导致人工制品的低端流动。为了解决这些问题,我们提出了3D认知全球通信,它们是可靠的流动,共同编码全球语义相关关系、地方变形和3D人体的几何前方前方。特别是,由于一对图像显示3D的源和目标人的外形和外形,(a)我们首先通过两个摄取其外观和高层次的外观,并引入一个粗度的对面的对面图解码,用多个精细的模块来预测像学全球通信。 (b)从图像中推断的3D对称人类模型是可靠的流,共同编码全球语义、本地变形、以及3D人的前方位前方形结构,以便对照校正对正进程进行校正的校正的对面图图图图图,从而显示SD-D-D-D-D-D-C-C-C-C-C-SMOGM-图的图图图的变。