Lifting the 2D human pose to the 3D pose is an important yet challenging task. Existing 3D pose estimation suffers from 1) the inherent ambiguity between the 2D and 3D data, and 2) the lack of well labeled 2D-3D pose pairs in the wild. Human beings are able to imagine the human 3D pose from a 2D image or a set of 2D body key-points with the least ambiguity, which should be attributed to the prior knowledge of the human body that we have acquired in our mind. Inspired by this, we propose a new framework that leverages the labeled 3D human poses to learn a 3D concept of the human body to reduce the ambiguity. To have consensus on the body concept from 2D pose, our key insight is to treat the 2D human pose and the 3D human pose as two different domains. By adapting the two domains, the body knowledge learned from 3D poses is applied to 2D poses and guides the 2D pose encoder to generate informative 3D "imagination" as embedding in pose lifting. Benefiting from the domain adaptation perspective, the proposed framework unifies the supervised and semi-supervised 3D pose estimation in a principled framework. Extensive experiments demonstrate that the proposed approach can achieve state-of-the-art performance on standard benchmarks. More importantly, it is validated that the explicitly learned 3D body concept effectively alleviates the 2D-3D ambiguity in 2D pose lifting, improves the generalization, and enables the network to exploit the abundant unlabeled 2D data.
翻译:将2D人姿势提升为3D人姿势是一项重要但富有挑战性的任务。 现有的3D人姿势估算是1 1 2D 和 3D 数据之间固有的模糊性, 2 2D 和 2 野生缺乏贴上标签的 2D-3D 相配体。 人类能够想象3D 由 2D 图像或一组 2D 体形的一组两D 键点构成, 这应该归因于我们大脑中已经获得的人类身体先前的知识。 受此启发, 我们提出一个新的框架, 利用 3D 标记的人姿势学习人体的3D 概念来减少模糊性。 要在 2D 的姿势上达成共识, 我们的关键洞察力是将2D 人姿势和 3D 人姿势视为两个不同的领域。 通过对这两个领域进行调整, 3D 所学的人体姿势知识应用到2D 的姿势, 指导 2D 成3D 以生成信息 3D 3D “ 想象 ” 使3D 模糊性 。 从域域调化的3D 概念的角度受益,, 使拟议的框架在不易理解2D 3D 3D 模型上 明确地展示了3D,, 的升级化的模型的模型 明确地展示了3D 。