Making top-down human pose estimation method present both good performance and high efficiency is appealing. Mask RCNN can largely improve the efficiency by conducting person detection and pose estimation in a single framework, as the features provided by the backbone are able to be shared by the two tasks. However, the performance is not as good as traditional two-stage methods. In this paper, we aim to largely advance the human pose estimation results of Mask-RCNN and still keep the efficiency. Specifically, we make improvements on the whole process of pose estimation, which contains feature extraction and keypoint detection. The part of feature extraction is ensured to get enough and valuable information of pose. Then, we introduce a Global Context Module into the keypoints detection branch to enlarge the receptive field, as it is crucial to successful human pose estimation. On the COCO val2017 set, our model using the ResNet-50 backbone achieves an AP of 68.1, which is 2.6 higher than Mask RCNN (AP of 65.5). Compared to the classic two-stage top-down method SimpleBaseline, our model largely narrows the performance gap (68.1 AP vs. 68.9 AP) with a much faster inference speed (77 ms vs. 168 ms), demonstrating the effectiveness of the proposed method. Code is available at: https://github.com/lingl_space/maskrcnn_keypoint_refined.
翻译:制作上到下到上到上到的人类表面估计方法既表现良好,效率也高,这是令人感兴趣的。 Mask RCNNN 能够通过进行个人探测和在一个单一框架内作出估计来大大提高效率,因为骨干所提供的特征能够由两个任务共同分享。但是,这种业绩不如传统的两阶段方法好。在本文件中,我们力求在很大程度上推进蒙斯-RCNN 的人体构成估计结果,并保持效率。具体地说,我们改进了整个构成估计过程,其中包括特征提取和关键点探测。特征提取部分可以确保获得足够和有价值的姿势信息。然后,我们在关键点检测分支中引入一个全球背景模块,以扩大接受的场,因为这是成功人类表面估计的关键。在 COCO val2017 设置上,我们使用 Res-50 骨干模型的模型达到AP 68.1, 比Mask RCNN (AP 65.5)。 与传统的两阶段自上到下到下到下到上到上到上到上到上到的SBebline方法相比,我们的模型大体上缩小了业绩差距(68.1 AP v. AP.9 AP),展示了168/macrefrass a/ apprass prefin srass pass pass pass pass pass pass pass.