从社交媒体视频中学习穿衣人 (Self-supervised 3D Representation Learning of Dressed Humans from Social Media Videos)

A key challenge of learning a visual representation for the 3D high fidelity geometry of dressed humans lies in the limited availability of the ground truth data (e.g., 3D scanned models), which results in the performance degradation of 3D human reconstruction when applying to real-world imagery. We address this challenge by leveraging a new data resource: a number of social media dance videos that span diverse appearance, clothing styles, performances, and identities. Each video depicts dynamic movements of the body and clothes of a single person while lacking the 3D ground truth geometry. To learn a visual representation from these videos, we present a new self-supervised learning method to use the local transformation that warps the predicted local geometry of the person from an image to that of another image at a different time instant. This allows self-supervision by enforcing a temporal coherence over the predictions. In addition, we jointly learn the depths along with the surface normals that are highly responsive to local texture, wrinkle, and shade by maximizing their geometric consistency. Our method is end-to-end trainable, resulting in high fidelity depth estimation that predicts fine geometry faithful to the input real image. We further provide a theoretical bound of self-supervised learning via an uncertainty analysis that characterizes the performance of the self-supervised learning without training. We demonstrate that our method outperforms the state-of-the-art human depth estimation and human shape recovery approaches on both real and rendered images.

翻译：学习3D 高忠诚度穿衣人高忠诚度几何仪的视觉表现是一项关键挑战,在于地面真实数据(如3D扫描模型)的可用性有限,这导致3D人类重建在应用真实世界图像时的性能退化。我们通过利用新的数据资源来应对这一挑战:一些社交媒体舞蹈视频,这些视频的外观、服装风格、表演和身份各不相同。每部视频都描述了一个人的身体和衣服动态运动,同时缺乏3D地面真实度几何。为了从这些视频中学习视觉表现,我们展示了一种新的自我监督学习方法,用这种方法将一个人的预测本地几何方法从图像转换为另一个图像。我们通过对预测进行时间一致性、服装风格、表演风格、表演和身份认同。我们共同学习深度与表面常态的深度一样,这些常态对当地纹、皱纹和阴影反应非常敏锐。我们的方法是端到端可训练的形状,从而在不同的时间里将真实的图像转换到另一个图像。我们通过高忠诚度的理论深度的深度分析来预测真实的自我测量方法,我们通过精确度的自我学习了精确度的自我分析方法,我们通过精确的深度的学习了自己的自我分析,我们通过精确的深度的自我评估方法,进一步的自我分析提供了精确的自我分析。