野生独视视频中的人类性能捕捉 (Human Performance Capture from Monocular Video in the Wild)

Capturing the dynamically deforming 3D shape of clothed human is essential for numerous applications, including VR/AR, autonomous driving, and human-computer interaction. Existing methods either require a highly specialized capturing setup, such as expensive multi-view imaging systems, or they lack robustness to challenging body poses. In this work, we propose a method capable of capturing the dynamic 3D human shape from a monocular video featuring challenging body poses, without any additional input. We first build a 3D template human model of the subject based on a learned regression model. We then track this template model's deformation under challenging body articulations based on 2D image observations. Our method outperforms state-of-the-art methods on an in-the-wild human video dataset 3DPW. Moreover, we demonstrate its efficacy in robustness and generalizability on videos from iPER datasets.

翻译：获取穿衣人动态变形的 3D 形状对于许多应用程序至关重要, 包括 VR/ AR 、自主驱动和人体计算机互动。现有的方法要么需要高度专业化的捕捉装置, 如昂贵的多视图成像系统, 要么它们缺乏对具挑战性身体构成的坚固性。在这项工作中, 我们提出一种方法, 能够从具有挑战性身体姿势的单向视频中捕捉动态 3D 人类形状, 而不增加任何输入。我们首先根据一个学习的回归模型, 构建一个3D 人体模型。我们然后在基于 2D 图像观察的具有挑战性的体表征下跟踪这个模板模型的变形。我们的方法超越了 3DPW 。此外, 我们展示了该模型在来自 iPER 数据集的视频上的坚固性和可概括性。