One of the major challenges in multi-person pose estimation is instance-aware keypoint estimation. Previous methods address this problem by leveraging an off-the-shelf detector, heuristic post-grouping process or explicit instance identification process, hindering further improvements in the inference speed which is an important factor for practical applications. From the statistical point of view, those additional processes for identifying instances are necessary to bypass learning the high-dimensional joint distribution of human keypoints, which is a critical factor for another major challenge, the occlusion scenario. In this work, we propose a novel framework of single-stage instance-aware pose estimation by modeling the joint distribution of human keypoints with a mixture density model, termed as MDPose. Our MDPose estimates the distribution of human keypoints' coordinates using a mixture density model with an instance-aware keypoint head consisting simply of 8 convolutional layers. It is trained by minimizing the negative log-likelihood of the ground truth keypoints. Also, we propose a simple yet effective training strategy, Random Keypoint Grouping (RKG), which significantly alleviates the underflow problem leading to successful learning of relations between keypoints. On OCHuman dataset, which consists of images with highly occluded people, our MDPose achieves state-of-the-art performance by successfully learning the high-dimensional joint distribution of human keypoints. Furthermore, our MDPose shows significant improvement in inference speed with a competitive accuracy on MS COCO, a widely-used human keypoint dataset, thanks to the proposed much simpler single-stage pipeline.
翻译:多人构成估计的主要挑战之一是实例认知关键点估计。 以往的方法通过利用现成的检测器、超光速的后组进程或清晰实例识别进程来解决这一问题,从而阻碍进一步推导速度的进一步提高,而这是实际应用的一个重要因素。 从统计角度看,确定实例的额外程序对于避免了解人类关键点的高维联合分布是必要的,这是另一个重大挑战,即隔离情景的一个关键因素。在这项工作中,我们提议了一个单一阶段实例认知速度构成估算的新框架,方法是以混合密度模型(称为MDPose)模拟人类关键点的联合分布。我们的MDPose利用混合密度模型(仅由实例认知关键点构成的8个革命层)来估计人类关键点坐标的分布。通过最大限度地减少地面真相关键点的负日志相似性。此外,我们提议了一个简单有效的培训战略,即随机关键点组合(RKG),以混合密度模型(称为MDP)为模型,大大缓解了人类关键端点的分布。 在人类关键端点下,通过高端数据流流中,通过高端数据学习了我们关键点的单个点数据,从而成功学习了我们的关键点的动态。