Estimating human pose and shape from monocular images is a long-standing problem in computer vision. Since the release of statistical body models, 3D human mesh recovery has been drawing broader attention. With the same goal of obtaining well-aligned and physically plausible mesh results, two paradigms have been developed to overcome challenges in the 2D-to-3D lifting process: i) an optimization-based paradigm, where different data terms and regularization terms are exploited as optimization objectives; and ii) a regression-based paradigm, where deep learning techniques are embraced to solve the problem in an end-to-end fashion. Meanwhile, continuous efforts are devoted to improving the quality of 3D mesh labels for a wide range of datasets. Though remarkable progress has been achieved in the past decade, the task is still challenging due to flexible body motions, diverse appearances, complex environments, and insufficient in-the-wild annotations. To the best of our knowledge, this is the first survey to focus on the task of monocular 3D human mesh recovery. We start with the introduction of body models and then elaborate recovery frameworks and training objectives by providing in-depth analyses of their strengths and weaknesses. We also summarize datasets, evaluation metrics, and benchmark results. Open issues and future directions are discussed in the end, hoping to motivate researchers and facilitate their research in this area. A regularly updated project page can be found at https://github.com/tinatiansjz/hmr-survey.
翻译:在计算机视野中,一个长期存在的问题就是用单体图像来估计人的外形和形状。自统计机构模型发布以来,3D人的网状恢复一直吸引更广泛的关注。同样的目标是获得与众不同和有形可信的网状结果,为了克服2D-3D提升进程中的挑战,已经开发了两种模式:(1) 以优化为基础的模式,将不同的数据条件和正规化条件作为优化目标加以利用;(2) 以回归为基础的模式,采用深层次的学习技术来最终解决问题。与此同时,正在不断努力提高3D网状标签的质量,用于各种数据集。尽管在过去十年中取得了显著进展,但由于机构动作灵活、外观、复杂环境以及动态说明不足,这项任务仍然具有挑战性。据我们所知,这是第一次侧重于单质的3D人类网状恢复任务的调查。我们首先介绍机构模型,然后详细制定恢复框架和培训目标,为此提供了对自身强项和弱项的深度分析,我们还定期讨论了这些强项和弱项的基准领域。我们还从目标的角度,在深度分析其强项/基准领域,我们还找到了对基准领域进行了评估。