Although significant progress has been achieved on monocular maker-less human motion capture in recent years, it is still hard for state-of-the-art methods to obtain satisfactory results in occlusion scenarios. There are two main reasons: the one is that the occluded motion capture is inherently ambiguous as various 3D poses can map to the same 2D observations, which always results in an unreliable estimation. The other is that no sufficient occluded human data can be used for training a robust model. To address the obstacles, our key-idea is to employ non-occluded human data to learn a joint-level spatial-temporal motion prior for occluded human with a self-supervised strategy. To further reduce the gap between synthetic and real occlusion data, we build the first 3D occluded motion dataset~(OcMotion), which can be used for both training and testing. We encode the motions in 2D maps and synthesize occlusions on non-occluded data for the self-supervised training. A spatial-temporal layer is then designed to learn joint-level correlations. The learned prior reduces the ambiguities of occlusions and is robust to diverse occlusion types, which is then adopted to assist the occluded human motion capture. Experimental results show that our method can generate accurate and coherent human motions from occluded videos with good generalization ability and runtime efficiency. The dataset and code are publicly available at \url{https://github.com/boycehbz/CHOMP}.
翻译:尽管近些年来在单层成像器和无层人类运动的捕捉方面取得了显著进展,但对于最先进的人类数据来说,在封闭情景下取得令人满意的结果仍然是困难的。主要原因有两大:一个是隐蔽的运动捕捉在本质上具有内在的模糊性,因为各种三维所显示的运动捕捉可以映射到相同的二维观测,结果总是不可靠的估计。另一个是没有足够隐蔽的人类数据可用于培训一个稳健模型。为了克服障碍,我们的关键理念是使用非隐蔽的人类数据来学习联合层次的空间-时空运动在以自我监督战略将人类隐蔽的人类隐蔽之前取得令人满意的结果。为了进一步缩小合成的和真实的隐蔽性数据之间的差距,我们建立了第一个隐蔽的移动数据数据集~(Ocmotion),这些数据可用于培训和测试。我们用2D地图和综合非隐蔽性数据的动作,用于自我监控培训。一个空间-时空层/隐蔽性层用来从空间-时空层学能力到之前的清晰度数据,然后用来学习高层次的隐蔽性模型,然后用来学习人类的隐蔽性模型,用来显示。