We introduce a novel task of reconstructing a time series of second-person 3D human body meshes from monocular egocentric videos. The unique viewpoint and rapid embodied camera motion of egocentric videos raise additional technical barriers for human body capture. To address those challenges, we propose a simple yet effective optimization-based approach that leverages 2D observations of the entire video sequence and human-scene interaction constraint to estimate second-person human poses, shapes, and global motion that are grounded on the 3D environment captured from the egocentric view. We conduct detailed ablation studies to validate our design choice. Moreover, we compare our method with the previous state-of-the-art method on human motion capture from monocular video, and show that our method estimates more accurate human-body poses and shapes under the challenging egocentric setting. In addition, we demonstrate that our approach produces more realistic human-scene interaction.
翻译:我们引入了一项新任务,即从单人自我中心视频中重建一个时间序列的二人3D人体外壳。 独特的视角和自我中心视频的迅速体现的摄像活动为人类身体捕捉带来了额外的技术障碍。 为了应对这些挑战,我们提议了一个简单而有效的优化方法,利用对整个视频序列的二维观测和人类-环境互动制约来估计二人构成、形状和全球运动,这些基于从自我中心观点中捕捉的三维环境。我们进行了详细的模拟研究,以验证我们的设计选择。此外,我们比较了我们的方法和以前从单人肉视频中捕捉人类运动的先进方法,并表明我们的方法估计了在挑战性自我中心环境中更准确的人体构成和形状。 此外,我们证明我们的方法产生了更符合现实的人类- 环境互动。