Multi-person pose estimation is an attractive and challenging task. Existing methods are mostly based on two-stage frameworks, which include top-down and bottom-up methods. Two-stage methods either suffer from high computational redundancy for additional person detectors or they need to group keypoints heuristically after predicting all the instance-agnostic keypoints. The single-stage paradigm aims to simplify the multi-person pose estimation pipeline and receives a lot of attention. However, recent single-stage methods have the limitation of low performance due to the difficulty of regressing various full-body poses from a single feature vector. Different from previous solutions that involve complex heuristic designs, we present a simple yet effective solution by employing instance-aware dynamic networks. Specifically, we propose an instance-aware module to adaptively adjust (part of) the network parameters for each instance. Our solution can significantly increase the capacity and adaptive-ability of the network for recognizing various poses, while maintaining a compact end-to-end trainable pipeline. Extensive experiments on the MS-COCO dataset demonstrate that our method achieves significant improvement over existing single-stage methods, and makes a better balance of accuracy and efficiency compared to the state-of-the-art two-stage approaches.
翻译:现有方法大多基于两阶段框架,其中包括自上而下和自下而上的方法。两阶段方法要么因额外人探测器的高计算冗余而产生,要么在预测了所有例中不可知的关键点之后,它们需要超自然地组合关键点。单阶段模式旨在简化多人构成估计管道,并引起很多注意。然而,由于难以从单一特性矢量中回归各种全体成份,最近的单一阶段方法在性能低方面受到限制。与以往涉及复杂超常设计的解决方案不同,我们通过使用有实例觉悟的动态网络,提出了一个简单而有效的解决方案。具体地说,我们建议一个有实例觉悟的模块,以适应性调整(部分)每个实例的网络参数。我们的解决方案可以大大提高网络识别各种成份的能力和适应性,同时保持一个从尾到尾到尾的紧凑管道。关于MS-CO数据集的广泛实验表明,我们的方法已经大大改进了现有单一阶段方法,我们采用了两种阶段方法,并且使准确性和效率更加平衡。