Recovering whole-body mesh by inferring the abstract pose and shape parameters from visual content can obtain 3D bodies with realistic structures. However, the inferring process is highly non-linear and suffers from image-mesh misalignment, resulting in inaccurate reconstruction. In contrast, 3D keypoint estimation methods utilize the volumetric representation to achieve pixel-level accuracy but may predict unrealistic body structures. To address these issues, this paper presents a novel hybrid inverse kinematics solution, HybrIK, that integrates the merits of 3D keypoint estimation and body mesh recovery in a unified framework. HybrIK directly transforms accurate 3D joints to body-part rotations via twist-and-swing decomposition. The swing rotations are analytically solved with 3D joints, while the twist rotations are derived from visual cues through neural networks. To capture comprehensive whole-body details, we further develop a holistic framework, HybrIK-X, which enhances HybrIK with articulated hands and an expressive face. HybrIK-X is fast and accurate by solving the whole-body pose with a one-stage model. Experiments demonstrate that HybrIK and HybrIK-X preserve both the accuracy of 3D joints and the realistic structure of the parametric human model, leading to pixel-aligned whole-body mesh recovery. The proposed method significantly surpasses the state-of-the-art methods on various benchmarks for body-only, hand-only, and whole-body scenarios. Code and results can be found at https://jeffli.site/HybrIK-X/
翻译:从视觉内容中推断抽象的姿势和形状参数,可以获得具有逼真结构的三维人体网格。然而,该推断过程非常非线性,并且易受图像-网格偏差影响,导致重建不准确。相比之下,三维关键点估计方法利用体积表示实现像素级准确性,但可能预测不真实的身体结构。为了解决这些问题,本文提出了一种新颖的混合反向运动学解决方案,HybrIK,它在统一框架中集成了3D关键点估计和身体网格恢复的优点。HybrIK直接通过扭-摆分解将准确的三维关节转化为身体部位旋转。摆动旋转使用三维关节点解析求解,而扭转旋转则通过神经网络从视觉线索中推导出来。为了捕捉全身细节,作者进一步开发了一个全面的框架,HybrIK-X,增强HybrIK以包括灵活的手部和表情丰富的面部。HybrIK-X通过一阶段模型快速且准确地解决全身姿势。实验表明,HybrIK和HybrIK-X保留了3D关节的准确性和参数化人体模型的逼真结构,从而实现像素对齐的全身网格重建。该方法在身体、手、全身场景的各种基准测试中均显著优于现有的最新方法。代码和结果可以在https://jeffli.site/HybrIK-X/上找到。