Multi-person pose estimation is an attractive and challenging task. Existing methods are mostly based on two-stage frameworks, which include top-down and bottom-up methods. Two-stage methods either suffer from high computational redundancy for additional person detectors or they need to group keypoints heuristically after predicting all the instance-agnostic keypoints. The single-stage paradigm aims to simplify the multi-person pose estimation pipeline and receives a lot of attention. However, recent single-stage methods have the limitation of low performance due to the difficulty of regressing various full-body poses from a single feature vector. Different from previous solutions that involve complex heuristic designs, we present a simple yet effective solution by employing instance-aware dynamic networks. Specifically, we propose an instance-aware module to adaptively adjust (part of) the network parameters for each instance. Our solution can significantly increase the capacity and adaptive-ability of the network for recognizing various poses, while maintaining a compact end-to-end trainable pipeline. Extensive experiments on the MS-COCO dataset demonstrate that our method achieves significant improvement over existing single-stage methods, and makes a better balance of accuracy and efficiency compared to the state-of-the-art two-stage approaches. The code and models are available at \url{https://github.com/hikvision-research/opera}.
翻译:多种人构成的估算是一项有吸引力和具有挑战性的任务。现有方法大多基于两阶段框架,包括自上而下和自下而上的方法。两阶段方法要么存在额外人探测器的高计算冗余,要么在预测了所有例中不可知的关键点之后,它们需要超速地组合关键点。单阶段模式旨在简化多人构成的估计管道,并获得大量关注。然而,由于从单一特性矢量中回归各种全体构成的难度,最近的单一阶段方法的性能水平有限。MS-CO数据集的广泛实验表明,我们的方法与以前涉及复杂超常设计的解决方案不同,我们通过使用有实例觉悟的动态网络提出了简单而有效的解决方案。具体地说,我们建议一个有实例觉悟的模块,以适应性调整(部分)每个实例的网络参数。我们的解决方案可以大大提高网络识别各种配置的能力和适应性能,同时保持一个紧凑的端对端对端可训练的管道。MS-COCO数据集的广泛实验表明,我们的方法在现有的单阶段方法上取得了显著的改进,但相对于当前单一阶段/视野的代码,更平衡。