Recovering the 3D geometric structure of a face from a single input image is a challenging active research area in computer vision. In this paper, we present a novel method for reconstructing 3D heads from a single or multiple image(s) using a hybrid approach based on deep learning and geometric techniques. We propose an encoder-decoder network based on the U-net architecture and trained on synthetic data only. It predicts both pixel-wise normal vectors and landmarks maps from a single input photo. Landmarks are used for the pose computation and the initialization of the optimization problem, which, in turn, reconstructs the 3D head geometry by using a parametric morphable model and normal vector fields. State-of-the-art results are achieved through qualitative and quantitative evaluation tests on both single and multi-view settings. Despite the fact that the model was trained only on synthetic data, it successfully recovers 3D geometry and precise poses for real-world images.
翻译:从单一输入图像中回收一个面部的 3D 几何结构是一个具有挑战性的计算机视觉积极研究领域。 在本文中,我们展示了一种新方法,利用基于深层次学习和几何技术的混合方法,从一个或多个图像中从一个或多个图像中重建 3D 头部。我们提议了一个基于 U-net 结构并仅接受合成数据培训的编码器解码器网络。它预测了单输入图片中的像素常态矢量和里程碑图。 陆标记被用于最优化问题的构成计算和初始化,而后者又通过使用可参数变形模型和普通矢量向场来重建3D 头部几何方法。 通过对单一和多视图环境进行定性和定量评估测试,取得了最新结果。 尽管模型仅接受合成数据培训,但它成功地恢复了3D 几何和真实世界图像的精确配置。