Current methods of multi-person pose estimation typically treat the localization and the association of body joints separately. It is convenient but inefficient, leading to additional computation and a waste of time. This paper, however, presents a novel framework PoseDet (Estimating Pose by Detection) to localize and associate body joints simultaneously at higher inference speed. Moreover, we propose the keypoint-aware pose embedding to represent an object in terms of the locations of its keypoints. The proposed pose embedding contains semantic and geometric information, allowing us to access discriminative and informative features efficiently. It is utilized for candidate classification and body joint localization in PoseDet, leading to robust predictions of various poses. This simple framework achieves an unprecedented speed and a competitive accuracy on the COCO benchmark compared with state-of-the-art methods. Extensive experiments on the CrowdPose benchmark show the robustness in the crowd scenes. Source code is available.
翻译:目前的多人估计方法通常分别处理机构联合体的定位和关联,这既方便又低效,导致额外的计算和浪费时间。然而,本文提出了一个新颖的框架PoseDet(通过探测估计Pose),以更高的推论速度同时进行本地化和联系机构联合体的配置和联系。此外,我们提议关键点认知构成嵌入,以代表其关键点位置的物体。拟议的嵌入包含语义和几何信息,使我们能够高效地获取有区别的和知情的特征。它用于PoseDet的候选人分类和身体联合定位,导致对各种形态的可靠预测。这一简单框架比最新方法在COCO基准上实现了前所未有的速度和竞争性的准确性。关于CCO基准的大规模实验显示人群场景的稳健性。有源代码。