Human pose estimation aims at localizing human anatomical keypoints or body parts in the input data (e.g., images, videos, or signals). It forms a crucial component in enabling machines to have an insightful understanding of the behaviors of humans, and has become a salient problem in computer vision and related fields. Deep learning techniques allow learning feature representations directly from the data, significantly pushing the performance boundary of human pose estimation. In this paper, we reap the recent achievements of 2D human pose estimation methods and present a comprehensive survey. Briefly, existing approaches put their efforts in three directions, namely network architecture design, network training refinement, and post processing. Network architecture design looks at the architecture of human pose estimation models, extracting more robust features for keypoint recognition and localization. Network training refinement tap into the training of neural networks and aims to improve the representational ability of models. Post processing further incorporates model-agnostic polishing strategies to improve the performance of keypoint detection. More than 200 research contributions are involved in this survey, covering methodological frameworks, common benchmark datasets, evaluation metrics, and performance comparisons. We seek to provide researchers with a more comprehensive and systematic review on human pose estimation, allowing them to acquire a grand panorama and better identify future directions.
翻译:人类的构成估计旨在将投入数据中的人类解剖关键点或身体部件(例如图像、视频或信号)本地化;它构成使机器能够深刻了解人类行为的一个关键组成部分,并已成为计算机视觉和相关领域的一个突出问题;深层次的学习技术使得能够直接从数据中学习特征,大大推移人构成估计的性能界限;在本文件中,我们收获了2D人构成估计方法的近期成就,并提出一份全面调查;简而言之,现有方法将其工作分为三个方向,即网络结构设计、网络培训完善和后处理;网络结构设计着眼于人构成估计模型的架构,为关键点的识别和本地化提取更强有力的特征;网络培训改进利用神经网络的培训,目的是提高模型的代表性能力;之后的处理还进一步纳入了模型-神学抛光战略,以改进关键点探测的性能;200多份研究贡献参与了这一调查,涉及方法框架、共同基准数据集、评价指标和业绩比较;网络结构设计着眼于人类构成模型的架构,我们寻求为研究人员提供更全面、更系统的未来方向,以便获得更全面、更精确地评估。