Given enough annotated training data, 3D human pose estimation models can achieve high accuracy. However, annotations are not always available, especially for people performing unusual activities. In this paper, we propose an algorithm that learns to detect 3D keypoints on human bodies from multiple-views without any supervision other than the constraints multiple-view geometry provides. To ensure that the estimated 3D keypoints are meaningful, they are re-projected to each view to estimate the person's mask that the model itself has initially estimated. Our approach outperforms other state-of-the-art unsupervised 3D human pose estimation methods on the Human3.6M and MPI-INF-3DHP benchmark datasets.
翻译:根据足够的附加说明的培训数据, 3D人构成估计模型可以达到很高的准确性。 但是, 说明并非总能提供, 特别是对于从事不寻常活动的人来说。 在本文中, 我们建议一种算法, 在除了多视图几何学提供的制约之外, 学会从多种视图中探测人体身体的3D关键点。 为了确保估计的3D关键点有意义, 将重新投向每种视图, 以估计模型本身最初估计的人的面具。 我们的方法比人类3. 6M 和 MPI- INF-3DHP 基准数据集的其他最先进的、 不受监督的3D 人构成估计方法要好得多。