Recovering detailed facial geometry from a set of calibrated multi-view images is valuable for its wide range of applications. Traditional multi-view stereo (MVS) methods adopt an optimization-based scheme to regularize the matching cost. Recently, learning-based methods integrate all these into an end-to-end neural network and show superiority of efficiency. In this paper, we propose a novel architecture to recover extremely detailed 3D faces within dozens of seconds. Unlike previous learning-based methods that regularize the cost volume via 3D CNN, we propose to learn an implicit function for regressing the matching cost. By fitting a 3D morphable model from multi-view images, the features of multiple images are extracted and aggregated in the mesh-attached UV space, which makes the implicit function more effective in recovering detailed facial shape. Our method outperforms SOTA learning-based MVS in accuracy by a large margin on the FaceScape dataset. The code and data are released in https://github.com/zhuhao-nju/mvfr.
翻译:从一套经过校准的多视图像中回收详细的面部几何学对于其广泛的应用十分宝贵。传统的多视立体(MVS)方法采用了一种基于优化的比对成本的正规化方案。最近,基于学习的方法将所有这些纳入端至端神经网络并展示效率的优越性。在本文中,我们提出了一个在数十秒内恢复极为详细的三维面孔的新结构。与以往通过3DCNN规范成本量的基于学习的方法不同,我们提议学习一种隐含功能来回缩匹配成本。通过对多视图像中的3D可变型模型进行安装,多图像的特征在网状和网状的UV空间中被提取和汇总,使隐含功能在恢复详细的面部形状方面更加有效。我们的方法比SOTA学习基于MVS的精确度要强得多,在FaceScape数据集上有一个很大的边距。代码和数据在http://gitub.com/zhuao-nju/mvfr中发布。