Virtual 3D try-on can provide an intuitive and realistic view for online shopping and has a huge potential commercial value. However, existing 3D virtual try-on methods mainly rely on annotated 3D human shapes and garment templates, which hinders their applications in practical scenarios. 2D virtual try-on approaches provide a faster alternative to manipulate clothed humans, but lack the rich and realistic 3D representation. In this paper, we propose a novel Monocular-to-3D Virtual Try-On Network (M3D-VTON) that builds on the merits of both 2D and 3D approaches. By integrating 2D information efficiently and learning a mapping that lifts the 2D representation to 3D, we make the first attempt to reconstruct a 3D try-on mesh only taking the target clothing and a person image as inputs. The proposed M3D-VTON includes three modules: 1) The Monocular Prediction Module (MPM) that estimates an initial full-body depth map and accomplishes 2D clothes-person alignment through a novel two-stage warping procedure; 2) The Depth Refinement Module (DRM) that refines the initial body depth to produce more detailed pleat and face characteristics; 3) The Texture Fusion Module (TFM) that fuses the warped clothing with the non-target body part to refine the results. We also construct a high-quality synthesized Monocular-to-3D virtual try-on dataset, in which each person image is associated with a front and a back depth map. Extensive experiments demonstrate that the proposed M3D-VTON can manipulate and reconstruct the 3D human body wearing the given clothing with compelling details and is more efficient than other 3D approaches.
翻译:虚拟 3D 试运行可为在线购物提供一个直观和现实的视图,并且具有巨大的潜在商业价值。 但是,现有的 3D 虚拟试运行方法主要依赖于附加说明的 3D 人形和服装模板,这阻碍了它们的实际应用。 2D 虚拟试运行方法为操控布衣人提供了更快的替代方法,但缺乏丰富和现实的 3D 代表。 在本文中,我们提议了一个新的单向三维虚拟试运行网络(M3D-Vton),它以 2D 和 3D 方法的优点为基础。 通过将 2D 信息高效地整合并学习将 2D 表达方式提升到 3D 3D 的映像,我们第一次尝试重建 3D 试运行方式只是将目标服装和一个人图像作为输入。 提议的 M3D 光度预测模块包括三个模块:1) 单向全体和3D 虚拟智能智能智能网络(MPM ) 以新的两阶段调程序来估算初始全体深度地图和2D 相关图像模块(D ) 将深度的深度的深度的图像升级模块升级模块, 将复制到更精化为结构的模型的模型, 打印的模型将显示到更精化到更精化的模型到更精化的图像到更精化的M 。