Real-time 3D human pose estimation is crucial for human-computer interaction. It is cheap and practical to estimate 3D human pose only from monocular video. However, recent bone splicing based 3D human pose estimation method brings about the problem of cumulative error. In this paper, the concept of virtual bones is proposed to solve such a challenge. The virtual bones are imaginary bones between non-adjacent joints. They do not exist in reality, but they bring new loop constraints for the estimation of 3D human joints. The proposed network in this paper predicts real bones and virtual bones, simultaneously. The final length of real bones is constrained and learned by the loop constructed by the predicted real bones and virtual bones. Besides, the motion constraints of joints in consecutive frames are considered. The consistency between the 2D projected position displacement predicted by the network and the captured real 2D displacement by the camera is proposed as a new projection consistency loss for the learning of 3D human pose. The experiments on the Human3.6M dataset demonstrate the good performance of the proposed method. Ablation studies demonstrate the effectiveness of the proposed inter-frame projection consistency constraints and intra-frame loop constraints.
翻译:实时 3D 人体构成估计对于人体-计算机互动至关重要。 估计 3D 人体构成只是用单体视频来估计 3D 人体构成是廉价和实用的。 但是, 最近的基于骨骼切分法基于 3D 人体构成估计方法带来了累积错误的问题 。 在本文中, 虚拟骨骼的概念是为解决这一挑战而提出的。 虚拟骨骼是非相邻的连接点之间的假骨。 它们并不存在现实中, 但是它们给估计 3D 人体连接带来新的循环限制 。 本文中提议的网络同时预测真实骨骼和虚拟骨骼的真实骨骼将同时预测出真实骨骼和虚拟骨骼。 真实骨骼的最后长度因预测实际骨骼和虚拟骨骼所构造的环圈而受到限制和学习。 此外, 还考虑到连续框架内连接的动作限制。 网络预测的2D 预测位置移位和摄取的摄取的2D 相之间的一致性是作为学习 3D 人体构成的一个新的预测一致性损失。 在 36.M 数据集上进行的实验显示了拟议方法的良好表现。 对比研究表明, 拟议的跨框架预测预测的连贯性限制和内部环圈限制是有效的。