We propose SelfRecon, a clothed human body reconstruction method that combines implicit and explicit representations to recover space-time coherent geometries from a monocular self-rotating human video. Explicit methods require a predefined template mesh for a given sequence, while the template is hard to acquire for a specific subject. Meanwhile, the fixed topology limits the reconstruction accuracy and clothing types. Implicit methods support arbitrary topology and have high quality due to continuous geometric representation. However, it is difficult to integrate multi-frame information to produce a consistent registration sequence for downstream applications. We propose to combine the advantages of both representations. We utilize differential mask loss of the explicit mesh to obtain the coherent overall shape, while the details on the implicit surface are refined with the differentiable neural rendering. Meanwhile, the explicit mesh is updated periodically to adjust its topology changes, and a consistency loss is designed to match both representations closely. Compared with existing methods, SelfRecon can produce high-fidelity surfaces for arbitrary clothed humans with self-supervised optimization. Extensive experimental results demonstrate its effectiveness on real captured monocular videos.
翻译:我们提出“自Recon”这一布满布料的人体重建方法,将隐含的和明确的表达方式结合起来,从单形自我旋转的人类视频中恢复空间时序一致的地理特征。清晰的方法要求为特定序列预设模板网格,而模板则很难为特定主题获取。同时,固定的表面学限制了重建的准确性和服装类型。隐含的方法支持了任意的地形学,并且由于连续几何表达方式而具有很高的质量。然而,很难将多框架信息结合起来,为下游应用生成一个一致的登记序列。我们提议将两种表达方式的优势结合起来。我们使用显形网格的差别遮罩损失来获得连贯的整体形状,而隐形面上的细节则随着不同的神经结构进行细化。与此同时,对明确的网格进行定期更新,以调整其地形变化,并设计一致性损失来密切匹配这两种表达方式。与现有方法相比,自我Reconncon可以产生任意穿衣服的人的高不固定的表面,并实现自我控制的优化。广泛的实验结果显示其对于实际捕获的单形视频的有效性。