Recent monocular human performance capture approaches have shown compelling dense tracking results of the full body from a single RGB camera. However, existing methods either do not estimate clothing at all or model cloth deformation with simple geometric priors instead of taking into account the underlying physical principles. This leads to noticeable artifacts in their reconstructions, such as baked-in wrinkles, implausible deformations that seemingly defy gravity, and intersections between cloth and body. To address these problems, we propose a person-specific, learning-based method that integrates a finite element-based simulation layer into the training process to provide for the first time physics supervision in the context of weakly-supervised deep monocular human performance capture. We show how integrating physics into the training process improves the learned cloth deformations, allows modeling clothing as a separate piece of geometry, and largely reduces cloth-body intersections. Relying only on weak 2D multi-view supervision during training, our approach leads to a significant improvement over current state-of-the-art methods and is thus a clear step towards realistic monocular capture of the entire deforming surface of a clothed human.
翻译:最近的单体人类性能捕捉方法显示,用一个RGB相机对全身进行惊人的密集跟踪结果,然而,现有的方法要么根本不估计衣服的全貌,要么不以简单的几何前科来进行模型布变形,而不是考虑到基本物理原理。这导致在重建过程中出现明显的文物,例如烤面皱、看起来无法令人置信的变形,以及衣物与身体之间的交叉点。为了解决这些问题,我们提议了一种针对具体人、以学习为基础的方法,将一个有限的元素模拟层纳入培训过程,以便在弱视的深度单体人类性能捕捉的情况下,首次进行物理监督。我们展示了将物理纳入培训过程如何改进所学的外形变形,允许将服装建模作为单独的几何形状,并在很大程度上缩小了布体的交叉点。我们的方法仅依靠薄弱的2D多视角监督,在培训期间,我们的方法可以大大改进目前的状态方法,从而朝着实际的单面捕捉到一个穿衣人的整个变形面。