This study considers the 3D human pose estimation problem in a single RGB image by proposing a conditional random field (CRF) model over 2D poses, in which the 3D pose is obtained as a byproduct of the inference process. The unary term of the proposed CRF model is defined based on a powerful heat-map regression network, which has been proposed for 2D human pose estimation. This study also presents a regression network for lifting the 2D pose to 3D pose and proposes the prior term based on the consistency between the estimated 3D pose and the 2D pose. To obtain the approximate solution of the proposed CRF model, the N-best strategy is adopted. The proposed inference algorithm can be viewed as sequential processes of bottom-up generation of 2D and 3D pose proposals from the input 2D image based on deep networks and top-down verification of such proposals by checking their consistencies. To evaluate the proposed method, we use two large-scale datasets: Human3.6M and HumanEva. Experimental results show that the proposed method achieves the state-of-the-art 3D human pose estimation performance.
翻译:本研究在单一的RGB图像中考虑了3D人构成的估算问题,提出了2D构成的有条件随机模型(CRF),其中3D构成是作为推论过程的副产品获得的。拟议的通用报告格式模型的单词用强大的热映射回归网络界定,这个网络是为2D人构成的估算而提议的。本研究还提出了一个将2D构成的2D构成提升为3D构成的回归网络,并根据估计的3D构成与2D构成的一致性提出了上一个术语。为了获得拟议的通用报告格式模型的近似解决办法,采用了最佳战略。拟议的推论算法可被视为2D和3D的自下而上一代的顺序过程,根据输入2D图像的深网络和自上而下核实这种建议,通过检查其组成情况,我们使用两个大型数据集:人类3.6M和人类Eva。实验结果显示,拟议的方法达到了3D人构成的状态估计性。