In this research, we address the challenge faced by existing deep learning-based human mesh reconstruction methods in balancing accuracy and computational efficiency. These methods typically prioritize accuracy, resulting in large network sizes and excessive computational complexity, which may hinder their practical application in real-world scenarios, such as virtual reality systems. To address this issue, we introduce a modular multi-stage lightweight graph-based transformer network for human pose and shape estimation from 2D human pose, a pose-based human mesh reconstruction approach that prioritizes computational efficiency without sacrificing reconstruction accuracy. Our method consists of a 2D-to-3D lifter module that utilizes graph transformers to analyze structured and implicit joint correlations in 2D human poses, and a mesh regression module that combines the extracted pose features with a mesh template to produce the final human mesh parameters.
翻译:在这一研究中,我们应对现有基于深层次学习的人类网目重建方法在平衡准确性和计算效率方面所面临的挑战。这些方法通常优先考虑准确性,导致网络规模大和计算复杂性过大,这可能会妨碍其在虚拟现实系统等现实世界情景中的实际应用。为了解决这一问题,我们引入了模块化多阶段轻重图形变压器网络,用于人类的外形和根据2D人的外形进行估测,基于结构化的人类网目重建方法,在不牺牲重建准确性的情况下优先考虑计算效率。我们的方法包括2D至3D升动器模块,利用图形变压器分析2D人的外形的结构化和隐含的联合相关性,以及将提取的外形特征与网目模板相结合以生成最终的人类网目参数的网目回归模块。