We propose a stochastic modeling framework for 3D human pose triangulation and evaluate its performance across different datasets and spatial camera arrangements. The common approach to 3D pose estimation is to first detect 2D keypoints in images and then apply the triangulation from multiple views. However, the majority of existing triangulation models are limited to a single dataset, i.e. camera arrangement and their number. Moreover, they require known camera parameters. The proposed stochastic pose triangulation model successfully generalizes to different camera arrangements and between two public datasets. In each step, we generate a set of 3D pose hypotheses obtained by triangulation from a random subset of views. The hypotheses are evaluated by a neural network and the expectation of the triangulation error is minimized. The key novelty is that the network learns to evaluate the poses without taking into account the spatial camera arrangement, thus improving generalization. Additionally, we demonstrate that the proposed stochastic framework can also be used for fundamental matrix estimation, showing promising results towards relative camera pose estimation from noisy keypoint correspondences.
翻译:我们为 3D 人造三角图案提议一个随机模型框架,并评估其在不同数据集和空间相机安排中的性能。对 3D 的通用估计方法是首先在图像中检测2D 关键点,然后从多个视图中应用三角图。然而,大多数现有的三角模型都局限于单一数据集,即相机安排及其编号。此外,它们需要已知的相机参数。提议的随机组合三角图模型成功地概括了不同的相机安排和两个公共数据集之间的性能。在每一步中,我们通过随机一组视图的三角图解得出一套3D 组合假设。这些假设由神经网络评估,而三角图案错误的预期最小化。关键的新颖之处是,网络学会在不考虑空间相机安排的情况下评价方位,从而改进了一般化。此外,我们证明拟议的随机框架也可以用于基本的矩阵估计,显示对来自噪音关键点对应通信的相对相机的表面估计有希望的结果。