In general, human pose estimation methods are categorized into two approaches according to their architectures: regression (i.e., heatmap-free) and heatmap-based methods. The former one directly estimates precise coordinates of each keypoint using convolutional and fully-connected layers. Although this approach is able to detect overlapped and dense keypoints, unexpected results can be obtained by non-existent keypoints in a scene. On the other hand, the latter one is able to filter the non-existent ones out by utilizing predicted heatmaps for each keypoint. Nevertheless, it suffers from quantization error when obtaining the keypoint coordinates from its heatmaps. In addition, unlike the regression one, it is difficult to distinguish densely placed keypoints in an image. To this end, we propose a hybrid model for single-stage multi-person pose estimation, named HybridPose, which mutually overcomes each drawback of both approaches by maximizing their strengths. Furthermore, we introduce self-correlation loss to inject spatial dependencies between keypoint coordinates and their visibility. Therefore, HybridPose is capable of not only detecting densely placed keypoints, but also filtering the non-existent keypoints in an image. Experimental results demonstrate that proposed HybridPose exhibits the keypoints visibility without performance degradation in terms of the pose estimation accuracy.
翻译:一般来说,人体姿态估计方法根据它们的架构被分为两种方法:回归方法(即无热度图法)和热度图法。前一种方法使用卷积和全连接层直接估计每个关节点的精确坐标。尽管这种方法能够检测到重叠和密集的关节点,但在场景中不存在关节点时会产生意外结果。另一种方法则利用每个关节点的预测热度图来过滤不存在的关节点。然而,从热度图中获得关键点坐标时会受到量化误差的影响。此外,与回归方法不同,它很难区分图像中密集的关键点。为此,我们提出了一种单阶段多人姿态估计的混合模型——HybridPose,它最大化了这两种方法的优点,相互克服了它们的缺点。此外,我们引入了自我相关性损失以注入关键点之间的空间依赖性和可视性。因此,HybridPose不仅能够检测到密集的关键点,还能够过滤图像中不存在的关键点。实验结果表明,所提出的HybridPose表现出关键点的可见性,而不会降低姿态估计的准确性。