Inspired by the success of volumetric 3D pose estimation, some recent human mesh estimators propose to estimate 3D skeletons as intermediate representations, from which, the dense 3D meshes are regressed by exploiting the mesh topology. However, body shape information is lost in extracting skeletons, leading to mediocre performance. The advanced motion capture systems solve the problem by placing dense physical markers on the body surface, which allows to extract realistic meshes from their non-rigid motions. However, they cannot be applied to wild images without markers. In this work, we present an intermediate representation, named virtual markers, which learns 64 landmark keypoints on the body surface based on the large-scale mocap data in a generative style, mimicking the effects of physical markers. The virtual markers can be accurately detected from wild images and can reconstruct the intact meshes with realistic shapes by simple interpolation. Our approach outperforms the state-of-the-art methods on three datasets. In particular, it surpasses the existing methods by a notable margin on the SURREAL dataset, which has diverse body shapes. Code is available at https://github.com/ShirleyMaxx/VirtualMarker.
翻译:受到体积计算三维姿势估计成功的启发,某些最近的人体网格估计器提议通过从中间表征预测3D骨架来推导出稠密的3D网格。然而,提取骨架会丢失身体形状信息,导致性能中等。高级动作捕捉系统通过在身体表面放置密集的物理标记来解决这个问题,这允许从它们的非刚性运动中提取真实的网格。然而,它们不能应用于没有标记的野生图像。在这项工作中,我们提出了一种中间表征,称为虚拟标记,它通过从大规模动作捕捉数据中以生成方式学习身体表面上的64个基准关键点来模仿物理标记的效果。虚拟标记可以从野生图像中精确检测,并通过简单的插值来重建具有真实形状的完整网格。我们的方法在三个数据集上优于现有技术方法。特别是,在具备多样化身体形状的SURREAL数据集上,它的性能超过了现有方法一个明显的边界。代码可在https://github.com/ShirleyMaxx/VirtualMarker中获得。