Human pose estimation in unconstrained images and videos is a fundamental computer vision task. To illustrate the evolutionary path in technique, in this survey we summarize representative human pose methods in a structured taxonomy, with a particular focus on deep learning models and single-person image setting. Specifically, we examine and survey all the components of a typical human pose estimation pipeline, including data augmentation, model architecture and backbone, supervision representation, post-processing, standard datasets, evaluation metrics. To envisage the future directions, we finally discuss the key unsolved problems and potential trends for human pose estimation.
翻译:人类在不受限制的图像和视频中的构成估计是一项基本的计算机愿景任务。 为了说明技术的演进路径,我们在这次调查中总结了结构化分类中具有代表性的人类构成方法,特别侧重于深层学习模式和单人图像设置。具体地说,我们检查和调查典型的人类构成估计管道的所有组成部分,包括数据增强、模型结构和主干、监督代表、后处理、标准数据集、评价指标。为了设想未来的方向,我们最终讨论了人类构成估计的关键问题和潜在趋势。