Model-based 3D pose and shape estimation methods reconstruct a full 3D mesh for the human body by estimating several parameters. However, learning the abstract parameters is a highly non-linear process and suffers from image-model misalignment, leading to mediocre model performance. In contrast, 3D keypoint estimation methods combine deep CNN network with the volumetric representation to achieve pixel-level localization accuracy but may predict unrealistic body structure. In this paper, we address the above issues by bridging the gap between body mesh estimation and 3D keypoint estimation. We propose a novel hybrid inverse kinematics solution (HybrIK). HybrIK directly transforms accurate 3D joints to relative body-part rotations for 3D body mesh reconstruction, via the twist-and-swing decomposition. The swing rotation is analytically solved with 3D joints, and the twist rotation is derived from the visual cues through the neural network. We show that HybrIK preserves both the accuracy of 3D pose and the realistic body structure of the parametric human model, leading to a pixel-aligned 3D body mesh and a more accurate 3D pose than the pure 3D keypoint estimation methods. Without bells and whistles, the proposed method surpasses the state-of-the-art methods by a large margin on various 3D human pose and shape benchmarks. As an illustrative example, HybrIK outperforms all the previous methods by 13.2 mm MPJPE and 21.9 mm PVE on 3DPW dataset. Our code is available at https://github.com/Jeff-sjtu/HybrIK.
翻译:以模型为基础的 3D 配置和形状估计方法, 通过估计几个参数, 重建人体的完整 3D 网格和形状 。 但是, 学习抽象参数是一个高度非线性的过程, 并且存在图像模型的不匹配, 导致中度模型性化模型性能。 相反, 3D 关键点估计方法将深重CNN 网络与体积表达方式相结合, 以实现像素级本地化精确度, 但可能预测不现实的体形结构。 本文中, 我们通过弥合机体网网和 3D 关键点估计之间的差距, 来解决上述问题。 我们提出一个新的反向运动混合解决方案( HybrIK ) 。 HybrIK 直接将准确的 3D 联合转换为相对的正向部分旋转, 3D 体形模型的重新组合。 3WlI 直接将正确的 3K 数据转换为更精确的 3D 模型, 以更精确的3D 格式 工具, 以更准确的3D 3D 格式 和 3D 方向的模型法 。