Human pose transfer has typically been modeled as a 2D image-to-image translation problem. This formulation ignores the human body shape prior in 3D space and inevitably causes implausible artifacts, especially when facing occlusion. To address this issue, we propose a lifting-and-projection framework to perform pose transfer in the 3D mesh space. The core of our framework is a foreground generation module, that consists of two novel networks: a lifting-and-projection network (LPNet) and an appearance detail compensating network (ADCNet). To leverage the human body shape prior, LPNet exploits the topological information of the body mesh to learn an expressive visual representation for the target person in the 3D mesh space. To preserve texture details, ADCNet is further introduced to enhance the feature produced by LPNet with the source foreground image. Such design of the foreground generation module enables the model to better handle difficult cases such as those with occlusions. Experiments on the iPER and Fashion datasets empirically demonstrate that the proposed lifting-and-projection framework is effective and outperforms the existing image-to-image-based and mesh-based methods on human pose transfer task in both self-transfer and cross-transfer settings.
翻译:人类姿势转换通常被模拟为 2D 图像到图像翻译问题。 这种配方忽略了3D 空间之前的人体形状,并不可避免地导致不可置信的人工制品,特别是在面临隐蔽时。 为了解决这个问题,我们提议了一个升降和投射框架框架,以便在 3D 网格空间进行转换。 我们框架的核心是一个前方生成模块, 由两个新型网络组成: 升降和投影网络( LPNet) 和外观详细补偿网络( ADCNet) 。 为了利用之前的人体形状, LPNet 利用人体图象图象信息, 学习3D 网格空间目标对象的直观表现。 为了保存纹度细节, 我们进一步引入了 ADCNet 来增强 LPNet 生成的功能, 其来源是地表图像。 这种地表生成模块的设计使模型能够更好地处理各种困难案例, 例如基于隐蔽的难题。 iPER 和 Fashion 数据集的实验, 实证地显示, 3D网格中的拟议升降和图像转换框架的自我转换是现有自我转换方法。