We present an approach for 3D global human mesh recovery from monocular videos recorded with dynamic cameras. Our approach is robust to severe and long-term occlusions and tracks human bodies even when they go outside the camera's field of view. To achieve this, we first propose a deep generative motion infiller, which autoregressively infills the body motions of occluded humans based on visible motions. Additionally, in contrast to prior work, our approach reconstructs human meshes in consistent global coordinates even with dynamic cameras. Since the joint reconstruction of human motions and camera poses is underconstrained, we propose a global trajectory predictor that generates global human trajectories based on local body movements. Using the predicted trajectories as anchors, we present a global optimization framework that refines the predicted trajectories and optimizes the camera poses to match the video evidence such as 2D keypoints. Experiments on challenging indoor and in-the-wild datasets with dynamic cameras demonstrate that the proposed approach outperforms prior methods significantly in terms of motion infilling and global mesh recovery.
翻译:我们提出一种3D全球人类网格从用动态相机录制的单镜头视频中恢复的方法。 我们的方法对严重和长期的隔离十分有力,甚至对人类身体进行跟踪,即使它们超出相机的视野。 为此,我们首先提出一个深基因运动填充器,根据可见的动作自动递增地填充隐蔽人类的体动。 此外,与以前的工作不同,我们的方法以一致的全球坐标重建人类的网格,即使与动态相机相较。由于人类运动和相机的构成的联合重建没有受到足够的控制,我们提议了一个全球轨迹预测器,根据地方身体运动产生全球人类轨迹。我们用预测的轨迹作为锚,提出一个全球优化框架,改进预测的轨迹,并优化摄像器的配置,以与2D关键点等视频证据相匹配。我们用动态相机对挑战性的室内和电动数据集进行实验,表明拟议的方法在填充和全球网状和全球网状恢复方面明显优于先前的方法。