We present a method to capture temporally coherent dynamic clothing deformation from a monocular RGB video input. In contrast to the existing literature, our method does not require a pre-scanned personalized mesh template, and thus can be applied to in-the-wild videos. To constrain the output to a valid deformation space, we build statistical deformation models for three types of clothing: T-shirt, short pants and long pants. A differentiable renderer is utilized to align our captured shapes to the input frames by minimizing the difference in both silhouette, segmentation, and texture. We develop a UV texture growing method which expands the visible texture region of the clothing sequentially in order to minimize drift in deformation tracking. We also extract fine-grained wrinkle detail from the input videos by fitting the clothed surface to the normal maps estimated by a convolutional neural network. Our method produces temporally coherent reconstruction of body and clothing from monocular video. We demonstrate successful clothing capture results from a variety of challenging videos. Extensive quantitative experiments demonstrate the effectiveness of our method on metrics including body pose error and surface reconstruction error of the clothing.
翻译:我们提出了一个方法,从单镜 RGB 视频输入中捕捉到时间上一致的动态衣物变形。 与现有的文献相比, 我们的方法不需要预先扫描的个性化网状模板, 因而可以应用到网上视频。 为了将输出限制到一个有效的变形空间, 我们为三种类型的衣物构建了统计变形模型: T恤衫、短裤和长裤。 我们用一个不同的翻版将我们所捕捉的形状与输入框相匹配, 最大限度地缩小胶片、 分解和纹理的差别。 我们开发了一种紫外线纹质增长方法, 扩大服装的可见纹理区域, 以便按顺序进行扩展, 以尽量减少变形跟踪中的漂移。 我们还从输入视频中提取细细微的皱纹细节, 将布面与由革命神经网络估计的正常地图相配对。 我们的方法通过单色视频对身体和服装进行时间一致的重建。 我们展示了各种富有挑战性视频的成功的服装捕捉取结果。 广泛的定量实验展示了我们测量度方法在测量度上的有效性, 包括身体错误和表面衣物面衣物重建错误。