Estimating 3D human pose from a single image is a challenging task. This work attempts to address the uncertainty of lifting the detected 2D joints to the 3D space by introducing an intermediate state-Part-Centric Heatmap Triplets (HEMlets), which shortens the gap between the 2D observation and the 3D interpretation. The HEMlets utilize three joint-heatmaps to represent the relative depth information of the end-joints for each skeletal body part. In our approach, a Convolutional Network (ConvNet) is first trained to predict HEMlets from the input image, followed by a volumetric joint-heatmap regression. We leverage on the integral operation to extract the joint locations from the volumetric heatmaps, guaranteeing end-to-end learning. Despite the simplicity of the network design, the quantitative comparisons show a significant performance improvement over the best-of-grade methods (e.g. $20\%$ on Human3.6M). The proposed method naturally supports training with "in-the-wild" images, where only weakly-annotated relative depth information of skeletal joints is available. This further improves the generalization ability of our model, as validated by qualitative comparisons on outdoor images. Leveraging the strength of the HEMlets pose estimation, we further design and append a shallow yet effective network module to regress the SMPL parameters of the body pose and shape. We term the entire HEMlets-based human pose and shape recovery pipeline HEMlets PoSh. Extensive quantitative and qualitative experiments on the existing human body recovery benchmarks justify the state-of-the-art results obtained with our HEMlets PoSh approach.
翻译:从单一图像中估计 3D 人造外形是一个挑战性的任务。 这项工作试图解决将检测到的 2D 连接提升到 3D 空间的不确定性, 方法是引入一个中间的州- 半 Centric Heatmap Tripples( HEMlets), 缩短 2D 观测和 3D 解释之间的差距。 HEMlets 使用三个联合热映射图, 以代表每个骨骼部分的最终连接的相对深度信息。 在我们的方法中, 一个 Convolution 网络( Convilal Net) 首次接受培训, 以便从输入图像中预测 HEM 显示 HEMl, 并随后进行体积联合热映。 尽管网络设计简洁, 数量比较显示最佳等级方法的性能显著改善( 例如, 人类3. 360M 美元 ) 。 拟议的方法自然地支持以“ 以以 电算为基的 ” 图像进行培训, 并随后进行量级联合回调 。 我们只能以注释的相对深度信息从体型 HKEM 联合 能力, 改进了我们目前 的模型 的深度 。