Recovering detailed facial geometry from a set of calibrated multi-view images is valuable for its wide range of applications. Traditional multi-view stereo (MVS) methods adopt optimization methods to regularize the matching cost. Recently, learning-based methods integrate all these into an end-to-end neural network and show superiority of efficiency. In this paper, we propose a novel architecture to recover extremely detailed 3D faces in roughly 10 seconds. Unlike previous learning-based methods that regularize the cost volume via 3D CNN, we propose to learn an implicit function for regressing the matching cost. By fitting a 3D morphable model from multi-view images, the features of multiple images are extracted and aggregated in the mesh-attached UV space, which makes the implicit function more effective in recovering detailed facial shape. Our method outperforms SOTA learning-based MVS in accuracy by a large margin on the FaceScape dataset. The code and data will be released soon.
翻译:从一套经过校准的多视图像中回收详细的面部几何学对于其广泛的应用十分宝贵。传统的多视立体(MVS)方法采用了优化方法来规范匹配成本。 最近,基于学习的方法将所有这些纳入端至端神经网络并展示效率的优越性。 在本文中,我们提出了一个新结构,在大约10秒钟内恢复极为详细的3D面孔。与以往通过3DCNN规范成本量的基于学习的方法不同,我们建议学习一种隐含功能来降低匹配成本。通过从多视图像中安装一个3D可变型模型,多种图像的特征被提取并汇总到网状的UV空间中,这使得隐含功能在恢复详细面形方面更加有效。我们的方法比SOTA学习的MVS精确度高,在FaceScape数据集上有一个很大的边距。代码和数据将很快发布。