3D human motion capture from monocular RGB images respecting interactions of a subject with complex and possibly deformable environments is a very challenging, ill-posed and under-explored problem. Existing methods address it only weakly and do not model possible surface deformations often occurring when humans interact with scene surfaces. In contrast, this paper proposes MoCapDeform, i.e., a new framework for monocular 3D human motion capture that is the first to explicitly model non-rigid deformations of a 3D scene for improved 3D human pose estimation and deformable environment reconstruction. MoCapDeform accepts a monocular RGB video and a 3D scene mesh aligned in the camera space. It first localises a subject in the input monocular video along with dense contact labels using a new raycasting based strategy. Next, our human-environment interaction constraints are leveraged to jointly optimise global 3D human poses and non-rigid surface deformations. MoCapDeform achieves superior accuracy than competing methods on several datasets, including our newly recorded one with deforming background scenes.
翻译:3D 人类运动从单镜 RGB 图像中捕捉到的3D 人类运动与一个具有复杂和可能变形环境的主体的相互作用有关,这是一个非常具有挑战性、错误和探索不足的问题。现有的方法仅能微弱地解决这个问题,而不会模拟在人类与表面互动时经常发生的表面变形。与此相反,本文提议采用MoCapDeform, 即单镜3D 人类运动的新框架,这是第一个明确模拟3D 场非硬变形以改进3D 人类表面估计和变形环境重建的模型。MoCapDeform 接受一个单镜 RGB 视频和 3D 场景网格在相机空间对齐。它首先将输入的单镜片中的一个对象与密集的接触标签同时使用新的光谱战略进行本地化。 其次,我们人类-环境的相互作用制约被用来联合优化全球3D 人类的外表变形和非硬化。 MoCapDeforporation 取得了优于若干数据集上竞合方法的精度, 包括我们新记录的背景面变形。