The practical application requests both accuracy and efficiency on multi-person pose estimation algorithms. But the high accuracy and fast inference speed are dominated by top-down methods and bottom-up methods respectively. To make a better trade-off between accuracy and efficiency, we propose a novel multi-person pose estimation framework, SIngle-network with Mimicking and Point Learning for Bottom-up Human Pose Estimation (SIMPLE). Specifically, in the training process, we enable SIMPLE to mimic the pose knowledge from the high-performance top-down pipeline, which significantly promotes SIMPLE's accuracy while maintaining its high efficiency during inference. Besides, SIMPLE formulates human detection and pose estimation as a unified point learning framework to complement each other in single-network. This is quite different from previous works where the two tasks may interfere with each other. To the best of our knowledge, both mimicking strategy between different method types and unified point learning are firstly proposed in pose estimation. In experiments, our approach achieves the new state-of-the-art performance among bottom-up methods on the COCO, MPII and PoseTrack datasets. Compared with the top-down approaches, SIMPLE has comparable accuracy and faster inference speed.
翻译:实际应用要求多人构成估计算法的准确性和效率。 但是,高精度和快速推导速度分别以自上而下的方法和自下而上的方法为主。 为了更好地权衡准确性和效率,我们提议了一个新型的多人构成估计框架,SIngle与Mimicking的网络和为自下而上的人类豆类估计进行点学习(SIMPLE)。具体地说,在培训过程中,我们使SIMPLE能够模仿高性能自上而下的管道的成形知识,该管道大大提高SIMPLE的准确性,同时在推断期间保持其高的效率。此外,SIMPLE将人类探测和估计作为一个统一的点学习框架,以在单一网络中补充对方。这与以前两个任务可能相互干扰的工作大不相同。根据我们的知识,在预测中首先提出了不同方法类型和统一点学习的模拟战略。 在实验中,我们的方法在COCO、MPII和SISMCRAF的高级精确度方法中实现了新状态。