Recently, huge strides were made in monocular and multi-view pose estimation with known camera parameters, whereas pose estimation from multiple cameras with unknown positions and orientations received much less attention. In this paper, we show how to train a neural model that can perform accurate 3D pose and camera estimation, takes into account joint location uncertainty due occlusion from multiple views, and requires only 2D keypoint data for training. Our method outperforms both classical bundle adjustment and weakly-supervised monocular 3D baselines on the well-established Human3.6M dataset, as well as the more challenging in-the-wild Ski-Pose PTZ dataset with moving cameras. We provide an extensive ablation study separating the error due to the camera model, number of cameras, initialization, and image-space joint localization from the additional error introduced by our model.
翻译:最近,以已知的摄像参数对单视和多视图像进行估计方面迈出了巨大的步伐,而从多个位置和方向不明的摄像头所作的估计则受到的注意要少得多。 在本文中,我们展示了如何训练能够进行准确的 3D 外观和摄影机估计的神经模型,考虑到从多重视图中分离出的共同位置不确定性,并只需要2D 关键点数据来进行培训。我们的方法优于古典捆绑定调整和成熟的 Human3.6M 数据集上受微弱监督的单视3D 基线,以及使用移动相机的更具有挑战性的技术-PTZ数据集。我们提供了一种广泛的反动研究,将由于相机模型、相机数量、初始化和图像-空间联合定位而导致的错误与我们模型引入的额外错误分开。