We present a novel method to learn temporally consistent 3D reconstruction of clothed people from a monocular video. Recent methods for 3D human reconstruction from monocular video using volumetric, implicit or parametric human shape models, produce per frame reconstructions giving temporally inconsistent output and limited performance when applied to video. In this paper, we introduce an approach to learn temporally consistent features for textured reconstruction of clothed 3D human sequences from monocular video by proposing two advances: a novel temporal consistency loss function; and hybrid representation learning for implicit 3D reconstruction from 2D images and coarse 3D geometry. The proposed advances improve the temporal consistency and accuracy of both the 3D reconstruction and texture prediction from a monocular video. Comprehensive comparative performance evaluation on images of people demonstrates that the proposed method significantly outperforms the state-of-the-art learning-based single image 3D human shape estimation approaches achieving significant improvement of reconstruction accuracy, completeness, quality and temporal consistency.
翻译:我们提出了一个新方法,从单视视像中学习对穿衣人进行符合时序的三维重建。最近使用体积、隐含或参数人形模型用单视像进行三维人重建的方法,根据每个框架的重建,提供时间上不一致的产出,在应用视频时的性能有限。在本文中,我们采用一种方法,通过提出两个进步来学习从单视视像中纹理的三维人序列重建时具有的时间一致性特征:一个新的时间一致性损失功能;从2D图像和粗体3D几何学中进行隐含的三维人形重建的混合代表学习。拟议的进步提高了三维人的重建和用单视像预测的时间一致性和准确性。对人们图像的综合比较性业绩评估表明,拟议的方法大大超越了基于学习的单一图像三维人的形状评估方法,从而大大改进了重建的准确性、完整性、质量和时间一致性。