HybrIK relies on a combination of analytical inverse kinematics and deep learning to produce more accurate 3D pose estimation from 2D monocular images. HybrIK has three major components: (1) pretrained convolution backbone, (2) deconvolution to lift 3D pose from 2D convolution features, (3) analytical inverse kinematics pass correcting deep learning prediction using learned distribution of plausible twist and swing angles. In this paper we propose an enhancement of the 2D to 3D lifting module, replacing deconvolution with Transformer, resulting in accuracy and computational efficiency improvement relative to the original HybrIK method. We demonstrate our results on commonly used H36M, PW3D, COCO and HP3D datasets. Our code is publicly available https://github.com/boreshkinai/hybrik-transformer.
翻译:本文提出了一种结合分析逆运动学和深度学习的方法,通过HybrIK模型从二维单目图像中生成更精准的三维姿态估计。HybrIK模型由三个组件组成:(1)预训练的卷积骨干网络,(2)反卷积将二维卷积特征提升到三维姿态,(3)基于学习的扭转和摆角合理分布的分析逆运动学传递,纠正深度学习预测结果。在此基础上,本文提出了一种改进的二维到三维升维模块,使用Transformer替换反卷积,相比于原始HybrIK方法,提高了预测精度和计算效率。本文在常用的 H36M、PW3D、COCO和HP3D数据集上进行了实验,效果良好。我们开放了代码 https://github.com/boreshkinai/hybrik-transformer。