Existing approaches for multi-view multi-person 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views and solve for the 3D pose estimation for each person. Establishing cross-view correspondences is challenging in multi-person scenes, and incorrect correspondences will lead to sub-optimal performance for the multi-stage pipeline. In this work, we present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot. Specifically, we propose to perform depth regression for each joint of each 2D pose in a target camera view. Cross-view consistency constraints are implicitly enforced by multiple reference camera views via the plane sweep algorithm to facilitate accurate depth regression. We adopt a coarse-to-fine scheme to first regress the person-level depth followed by a per-person joint-level relative depth estimation. 3D poses are obtained from a simple back-projection given the estimated depths. We evaluate our approach on benchmark datasets where it outperforms previous state-of-the-arts while being remarkably efficient. Our code is available at https://github.com/jiahaoLjh/PlaneSweepPose.
翻译:多视角多人 3D 显示估计 现有多视角多人 3D 显示为 向 2D 组的交叉视图通信 显示从多个相机视图中检测的检测, 并解决 3D 显示每个人的 3D 显示估计 。 建立交叉视图通信在多人的场景中具有挑战性, 不正确的通信将导致多阶段管道的性能低于最佳水平。 在这项工作中, 我们展示了我们基于空扫描立方体的多视角 3D 显示的估算方法, 以一个镜头来共同应对交叉视图聚合, 3D 显示重建 。 具体地说, 我们提议在目标相机视图中为每个2D 显示的每个联合组合进行深度回归。 交叉视图一致性限制通过多个参考相机的图像来间接实施, 以便于准确的深度回归。 我们采用了一个粗略到平方的系统, 以首先回归人的深度, 并随后进行个人联合深度估计。 3D 3D 表示来自一个简单的回投影项目。 我们评估了我们的基准数据集方法, 其位置比以往的状态要优于相当高效的状态。 我们的代码可在 MAGROP/ 。