Inter-person occlusion and depth ambiguity make estimating the 3D poses of monocular multiple persons as camera-centric coordinates a challenging problem. Typical top-down frameworks suffer from high computational redundancy with an additional detection stage. By contrast, the bottom-up methods enjoy low computational costs as they are less affected by the number of humans. However, most existing bottom-up methods treat camera-centric 3D human pose estimation as two unrelated subtasks: 2.5D pose estimation and camera-centric depth estimation. In this paper, we propose a unified model that leverages the mutual benefits of both these subtasks. Within the framework, a robust structured 2.5D pose estimation is designed to recognize inter-person occlusion based on depth relationships. Additionally, we develop an end-to-end geometry-aware depth reasoning method that exploits the mutual benefits of both 2.5D pose and camera-centric root depths. This method first uses 2.5D pose and geometry information to infer camera-centric root depths in a forward pass, and then exploits the root depths to further improve representation learning of 2.5D pose estimation in a backward pass. Further, we designed an adaptive fusion scheme that leverages both visual perception and body geometry to alleviate inherent depth ambiguity issues. Extensive experiments demonstrate the superiority of our proposed model over a wide range of bottom-up methods. Our accuracy is even competitive with top-down counterparts. Notably, our model runs much faster than existing bottom-up and top-down methods.
翻译:由于摄像中心坐标是一个具有挑战性的问题。典型的自上而下框架存在高计算冗余,加上一个额外的探测阶段。相反,自下而上的方法享有低计算成本,因为它们受人类数量的影响较小。然而,大多数现有自下而上的方法将以摄像为中心3D人为的估算作为两个互不相关的子任务处理:2.5D构成估计和以摄像为中心深度估计。在本文件中,我们提出了一个统一模型,利用这两个子任务之间的互利。在这个框架内,一个结构严密的自上而下框架旨在识别基于深度关系而的人与人之间的隔绝。此外,我们开发了一个从终端到终端的几何测深法,利用2.5D构成和以摄像为中心的人与底深的两者的互利。这种方法首先使用2.5D构成和几何测量信息来推断以摄像为中心,然后利用底深层来进一步改进2.5D的模型,然后利用底部的深度来进一步改进对底部测算,甚至从深处对底部的测算。我们提出的直部测底部测测测度方法比我们的直地更深处的底部。我们所设计的直观测测底部方法,我们设计了一个更深处的深处方法,用来对底部。我们现有的直部测测测测测测测测测测测测测测。我们为我们的底部的底部。我们为我们的底部。我们设计的底部方法,要要要要对底部方法比了我们的深处的深处,要的深处,要对底部,要比我们的深处,要更深处,要更甚甚甚甚甚甚甚甚于我们的深处。