We address the problem of generalizability for multi-view 3D human pose estimation. The standard approach is to first detect 2D keypoints in images and then apply triangulation from multiple views. Even though the existing methods achieve remarkably accurate 3D pose estimation on public benchmarks, most of them are limited to a single spatial camera arrangement and their number. Several methods address this limitation but demonstrate significantly degraded performance on novel views. We propose a stochastic framework for human pose triangulation and demonstrate a superior generalization across different camera arrangements on two public datasets. In addition, we apply the same approach to the fundamental matrix estimation problem, showing that the proposed method can successfully apply to other computer vision problems. The stochastic framework achieves more than 8.8% improvement on the 3D pose estimation task, compared to the state-of-the-art, and more than 30% improvement for fundamental matrix estimation, compared to a standard algorithm.
翻译:我们处理多视图 3D 人造图象估计的通用性问题。 标准方法是首先在图像中检测二维关键点,然后从多个视图中进行三角对比。 尽管现有方法在公共基准上实现了非常精确的三维估计, 但大多数都局限于单一的空间相机安排和数量。 几种方法解决了这一局限性,但在新观点上表现了显著退化。 我们建议了一种人类成形三角结构的随机框架,并展示了两种公共数据集不同相机安排的优异通用性。 此外,我们对基本矩阵估计问题也采用了同样的方法,表明拟议方法可以成功地适用于其他计算机视觉问题。 与标准算法相比,3D 构成估计任务改进了8.8%以上。 与标准算法相比,基本矩阵估计改进了30%以上。