While current monocular 3D face reconstruction methods can recover fine geometric details, they suffer several limitations. Some methods produce faces that cannot be realistically animated because they do not model how wrinkles vary with expression. Other methods are trained on high-quality face scans and do not generalize well to in-the-wild images. We present the first approach that regresses 3D face shape and animatable details that are specific to an individual but change with expression. Our model, DECA (Detailed Expression Capture and Animation), is trained to robustly produce a UV displacement map from a low-dimensional latent representation that consists of person-specific detail parameters and generic expression parameters, while a regressor is trained to predict detail, shape, albedo, expression, pose and illumination parameters from a single image. To enable this, we introduce a novel detail-consistency loss that disentangles person-specific details from expression-dependent wrinkles. This disentanglement allows us to synthesize realistic person-specific wrinkles by controlling expression parameters while keeping person-specific details unchanged. DECA is learned from in-the-wild images with no paired 3D supervision and achieves state-of-the-art shape reconstruction accuracy on two benchmarks. Qualitative results on in-the-wild data demonstrate DECA's robustness and its ability to disentangle identity- and expression-dependent details enabling animation of reconstructed faces. The model and code are publicly available at https://deca.is.tue.mpg.de.
翻译:虽然当前单眼的 3D 面部重建方法可以恢复精细的几何细节, 但它们会受到一些限制。 有些方法产生的面部无法真实地动画, 因为它们不以表达式来模拟细微的表达式变化。 其他方法经过高质量的面部扫描培训, 并且不以模糊的图像来概括。 我们展示了回归 3D 面部形状和可想象的细节的第一个方法, 这些细节是个人特有的, 但用表达式来改变。 我们的模型 DECA( 详细直观的表达式抓取和动画) 受过训练, 能够从低维度的潜潜表层中生出一个紫外形地图, 由个人特有的详细参数和通用表达式表达式参数组成, 而一个后退的图像则经过训练, 预测细节、 形状、 形状、 反射线、 图像的精确度, 从一个图像的精确度、 清晰度、 清晰度、 图表的精确度、 图表的精确度、 图表的精确度、 和图表的精确度, 图表的精确度, 和图表的精确度, 都从 学习了。