Face reconstruction and tracking is a building block of numerous applications in AR/VR, human-machine interaction, as well as medical applications. Most of these applications rely on a metrically correct prediction of the shape, especially, when the reconstructed subject is put into a metrical context (i.e., when there is a reference object of known size). A metrical reconstruction is also needed for any application that measures distances and dimensions of the subject (e.g., to virtually fit a glasses frame). State-of-the-art methods for face reconstruction from a single image are trained on large 2D image datasets in a self-supervised fashion. However, due to the nature of a perspective projection they are not able to reconstruct the actual face dimensions, and even predicting the average human face outperforms some of these methods in a metrical sense. To learn the actual shape of a face, we argue for a supervised training scheme. Since there exists no large-scale 3D dataset for this task, we annotated and unified small- and medium-scale databases. The resulting unified dataset is still a medium-scale dataset with more than 2k identities and training purely on it would lead to overfitting. To this end, we take advantage of a face recognition network pretrained on a large-scale 2D image dataset, which provides distinct features for different faces and is robust to expression, illumination, and camera changes. Using these features, we train our face shape estimator in a supervised fashion, inheriting the robustness and generalization of the face recognition network. Our method, which we call MICA (MetrIC fAce), outperforms the state-of-the-art reconstruction methods by a large margin, both on current non-metric benchmarks as well as on our metric benchmarks (15% and 24% lower average error on NoW, respectively).
翻译:面部重建与跟踪是AR/VR、人体机器互动以及医疗应用中许多应用的建筑块。这些应用大多依赖于对形状的精确预测,特别是当将重建对象置于一个测量背景时(即当有一个已知大小的参考对象时)。对于测量该主题的距离和尺寸(例如,几乎适合玻璃框)的任何应用,也需要进行度度重建。从单一图像中进行面部重建的最先进的方法,以自我监督的方式在大型2D图像数据集中接受培训。然而,由于视觉预测的性质,它们无法重建实际面部尺寸,甚至预测平均人面部在测量意义上优于这些方法的某些特征。要了解面部的实际形状和尺寸(例如,几乎适合一个镜框框框架),任何测量该主题的距离和尺寸的3D数据集,我们用一个加注的和统一的中、中、中、中、中、中、中、上一个统一的数据集仍然是中、中、中、上、下、下、上、下、下、下、上、下、下、下、下、下、上、下、下、下、下、下、下、下、下、下、下、下、上、下、下、下、下、下、下、下、下、下、下、下、下、上、上、上、上、上、下、上、下、下、下、上、上、下、下、下、下、下、下、上、上、上、上、上、上、上、上、上、上、下、下、下、下、下、下、下、上、下、下、上、上、上、上、上、上、下、上、上、下、上、上、下、下、下、上、下、上、上、上、上、上、上、上、上、上、上、上、上、上、上、上、上、上、下、上、上、上、上、上、上、上、上、上、上、上、上、上、上、上、上、上、上、上、上、上、上、上、上、上、上、上、上、上、上、上、上、上、上、上、上、上、上、上、上、上、上、