Human pose estimation in two-dimensional images videos has been a hot topic in the computer vision problem recently due to its vast benefits and potential applications for improving human life, such as behaviors recognition, motion capture and augmented reality, training robots, and movement tracking. Many state-of-the-art methods implemented with Deep Learning have addressed several challenges and brought tremendous remarkable results in the field of human pose estimation. Approaches are classified into two kinds: the two-step framework (top-down approach) and the part-based framework (bottom-up approach). While the two-step framework first incorporates a person detector and then estimates the pose within each box independently, detecting all body parts in the image and associating parts belonging to distinct persons is conducted in the part-based framework. This paper aims to provide newcomers with an extensive review of deep learning methods-based 2D images for recognizing the pose of people, which only focuses on top-down approaches since 2016. The discussion through this paper presents significant detectors and estimators depending on mathematical background, the challenges and limitations, benchmark datasets, evaluation metrics, and comparison between methods.
翻译:人类在二维图像视频中的构成估计是计算机视觉问题中最近的一个热门话题,原因是其对于改善人类生活的巨大好处和潜在应用,如行为识别、运动捕捉和扩大现实、培训机器人和运动跟踪等,深学习公司采用的许多最先进的方法应对了若干挑战,在人类构成估计领域带来了巨大的显著成果。方法分为两类:两步框架(自上而下的方法)和部分框架(自下而上的方法)。虽然两步框架首先包括一个人探测器,然后独立地估计每个盒子的构成,但在部分框架内对图像中的所有身体部分进行探测,并将属于不同个人的各个部分联系起来。本文旨在为新来者提供对基于2D的深层学习方法图象的广泛审查,以便认识人们的构成,自2016年以来,仅侧重于自上而下的方法。本文的讨论根据数学背景、挑战和局限性、基准数据集、评价指标以及方法之间的比较,提供了重要的探测器和估计器。