We propose a new self-supervised method for predicting 3D human body pose from a single image. The prediction network is trained from a dataset of unlabelled images depicting people in typical poses and a set of unpaired 2D poses. By minimising the need for annotated data, the method has the potential for rapid application to pose estimation of other articulated structures (e.g. animals). The self-supervision comes from an earlier idea exploiting consistency between predicted pose under 3D rotation. Our method is a substantial advance on state-of-the-art self-supervised methods in training a mapping directly from images, without limb articulation constraints or any 3D empirical pose prior. We compare performance with state-of-the-art self-supervised methods using benchmark datasets that provide images and ground-truth 3D pose (Human3.6M, MPI-INF-3DHP). Despite the reduced requirement for annotated data, we show that the method outperforms on Human3.6M and matches performance on MPI-INF-3DHP. Qualitative results on a dataset of human hands show the potential for rapidly learning to predict 3D pose for articulated structures other than the human body.
翻译:我们提出了一种新的自监督方法,用于从单个图像预测3D人体姿势。预测网络是从一个未标记图像数据集和一组未配对2D姿势中训练的。通过最小化对注释数据的需求,该方法具有快速应用于其他关节结构(例如动物)姿势估计的潜力。自监督来自早期利用3D旋转下预测姿势之间一致性的思想。我们的方法在直接从图像训练映射时,不需要肢体关节约束或任何3D经验姿态先验,是自监督方法的实质性进展。我们使用提供图像和地面实况3D姿势的基准数据集(Human3.6M、MPI-INF-3DHP)与最先进的自监督方法进行性能比较。尽管对注释数据的要求减少了,但我们证明该方法在Human3.6M上的表现优于最先进的方法,并在MPI-INF-3DHP上与准确性相匹配。人手数据集的定性结果表明,快速学习预测关节结构(除人体外)的3D姿势的潜力。