We propose a simple yet reliable bottom-up approach with a good trade-off between accuracy and efficiency for the problem of multi-person pose estimation. Given an image, we employ an Hourglass Network to infer all the keypoints from different persons indiscriminately as well as the guiding offsets connecting the adjacent keypoints belonging to the same persons. Then, we greedily group the candidate keypoints into multiple human poses (if any), utilizing the predicted guiding offsets. And we refer to this process as greedy offset-guided keypoint grouping (GOG). Moreover, we revisit the encoding-decoding method for the multi-person keypoint coordinates and reveal some important facts affecting accuracy. Experiments have demonstrated the obvious performance improvements brought by the introduced components. Our approach is comparable to the state of the art on the challenging COCO dataset under fair conditions. The source code and our pre-trained model are publicly available online.
翻译:我们建议一种简单而可靠的自下而上的方法,在多人构成估计问题的准确度和效率之间进行良好的权衡。根据图像,我们使用沙漏网络,不加区别地推断不同人士的所有关键点,以及将属于同一个人的相邻关键点联系起来的指导性补分。然后,我们贪婪地将候选关键点分组为多重关键点(如果有的话),利用预测的指导性补分。我们把这一过程称为贪婪的抵消性指导性关键点组(GGOG )。此外,我们重新审视多人关键点坐标的编码解码方法,并揭示影响准确性的一些重要事实。实验已经展示了所介绍的各组成部分带来的明显的绩效改进。我们的方法与在公平条件下挑战COCO数据集的艺术状况相似。源代码和我们预先培训的模式可以在网上公开查阅。