Inspired by the success of volumetric 3D pose estimation, some recent human mesh estimators propose to estimate 3D skeletons as intermediate representations, from which, the dense 3D meshes are regressed by exploiting the mesh topology. However, body shape information is lost in extracting skeletons, leading to mediocre performance. The advanced motion capture systems solve the problem by placing dense physical markers on the body surface, which allows to extract realistic meshes from their non-rigid motions. However, they cannot be applied to wild images without markers. In this work, we present an intermediate representation, named virtual markers, which learns 64 landmark keypoints on the body surface based on the large-scale mocap data in a generative style, mimicking the effects of physical markers. The virtual markers can be accurately detected from wild images and can reconstruct the intact meshes with realistic shapes by simple interpolation. Our approach outperforms the state-of-the-art methods on three datasets. In particular, it surpasses the existing methods by a notable margin on the SURREAL dataset, which has diverse body shapes. Code is available at https://github.com/ShirleyMaxx/VirtualMarker.
翻译:受到体积三维姿势估计成功的启发,最近一些人体网格估计器提出利用估计中间表示的三维骨架,通过利用网格拓扑关系来回归密集的三维网格。但是,提取骨架时丢失了身体形态信息,导致性能中等。先进的运动捕捉系统通过在身体表面放置密集的物理标记来解决问题,这允许从其非刚性运动中提取逼真的网格。然而,它们不能应用于没有标记的野外图像。在这项工作中,我们提出了一种名为虚拟标记的中间表示,它基于大规模动作捕捉数据以生成的方式学习身体表面上的64个关键点,模仿物理标记的效果。虚拟标记可以从野外图像中准确检测,并通过简单的插值重构具有逼真形状的完整网格。我们的方法在三个数据集上均优于现有方法,特别是在具有不同身体形态的SURREAL数据集上,它超越了现有方法一个显着的差距。可在https://github.com/ShirleyMaxx/VirtualMarker上找到代码。