Although monocular 3D human pose estimation methods have made significant progress, it's far from being solved due to the inherent depth ambiguity. Instead, exploiting multi-view information is a practical way to achieve absolute 3D human pose estimation. In this paper, we propose a simple yet effective pipeline for weakly-supervised cross-view 3D human pose estimation. By only using two camera views, our method can achieve state-of-the-art performance in a weakly-supervised manner, requiring no 3D ground truth but only 2D annotations. Specifically, our method contains two steps: triangulation and refinement. First, given the 2D keypoints that can be obtained through any classic 2D detection methods, triangulation is performed across two views to lift the 2D keypoints into coarse 3D poses.Then, a novel cross-view U-shaped graph convolutional network (CV-UGCN), which can explore the spatial configurations and cross-view correlations, is designed to refine the coarse 3D poses. In particular, the refinement progress is achieved through weakly-supervised learning, in which geometric and structure-aware consistency checks are performed. We evaluate our method on the standard benchmark dataset, Human3.6M. The Mean Per Joint Position Error on the benchmark dataset is 27.4 mm, which outperforms the state-of-the-arts remarkably (27.4 mm vs 30.2 mm).
翻译:虽然单眼的 3D 人造3D 人造3D 的估测方法取得了显著进展,但由于内在的深度模糊,这种方法远未解决。 相反,利用多视图信息是实现绝对3D 人造3D 的估测的实用方法。 在本文中,我们提出一个简单而有效的管道,用于低监控的交叉视图 3D 人造3D 人造3D 估测。我们的方法仅使用两个摄像视图,就能够以低监控方式实现最先进的性能,不需要3D 地面真情,而只需要2D 说明。具体地说,我们的方法包含两个步骤:三角和完善。首先,鉴于通过任何经典的 2D 探测方法可以获得的2D 个关键点,我们可以通过两个视角进行三角测量,将2D 的2D 关键点提升为粗眼3D 3D 人造3D 人造3D 的估测测图。然后,一个新型的交叉视图 U形图图变图网络(CV-UGCN) 能够探索空间配置和交叉视图的相交配关系,目的是3D 3D 。具体的3D 。具体的3D 。特别是通过薄弱的学习学习的学习获得进展的进展 。我们的标准 标准 标准 标准 标准 标准 标准 标准 标准 标准 4 标准 标准 标准 标准 标准 标准 标准 标准 标准 标准 标准 标准 标准 标准 。