We present a new method, called MEsh TRansfOrmer (METRO), to reconstruct 3D human pose and mesh vertices from a single image. Our method uses a transformer encoder to jointly model vertex-vertex and vertex-joint interactions, and outputs 3D joint coordinates and mesh vertices simultaneously. Compared to existing techniques that regress pose and shape parameters, METRO does not rely on any parametric mesh models like SMPL, thus it can be easily extended to other objects such as hands. We further relax the mesh topology and allow the transformer self-attention mechanism to freely attend between any two vertices, making it possible to learn non-local relationships among mesh vertices and joints. With the proposed masked vertex modeling, our method is more robust and effective in handling challenging situations like partial occlusions. METRO generates new state-of-the-art results for human mesh reconstruction on the public Human3.6M and 3DPW datasets. Moreover, we demonstrate the generalizability of METRO to 3D hand reconstruction in the wild, outperforming existing state-of-the-art methods on FreiHAND dataset. Code and pre-trained models are available at https://github.com/microsoft/MeshTransformer.
翻译:我们提出一种新的方法,称为MEsh TRansfOrmer(METERO),从一个图像中重建3D人姿势和网状螺旋。我们的方法使用变压器编码器来同时模拟顶顶顶和顶顶端联合互动,以及输出 3D 联合坐标和网形脊椎。与后退和形状参数的现有技术相比,MEDRO并不依赖SMPL 等任何参数网状模型,因此可以很容易地将其扩展至其他物体,例如手。我们进一步放松网状表层学,允许变压器自控机制在任何两个顶部之间自由运行,从而有可能在网状顶和顶端联合之间学习非本地关系。随着拟议的遮蔽的顶部模型,我们的方法在应对像部分封闭等具有挑战性的情况方面更加有力和有效。MEDRO为公众人3.6M 和 3DPW 数据集的人类网状元重建创造新的状态/艺术成果。此外,我们展示了METAF-S-SOFI-SOFI-SOFADFADAND AND AND AND Redustrual Redual Redustrations