In this work, we aim to enhance model-based face reconstruction by avoiding fitting the model to outliers, i.e. regions that cannot be well-expressed by the model such as occluders or make-up. The core challenge for localizing outliers is that they are highly variable and difficult to annotate. To overcome this challenging problem, we introduce a joint Face-autoencoder and outlier segmentation approach (FOCUS).In particular, we exploit the fact that the outliers cannot be fitted well by the face model and hence can be localized well given a high-quality model fitting. The main challenge is that the model fitting and the outlier segmentation are mutually dependent on each other, and need to be inferred jointly. We resolve this chicken-and-egg problem with an EM-type training strategy, where a face autoencoder is trained jointly with an outlier segmentation network. This leads to a synergistic effect, in which the segmentation network prevents the face encoder from fitting to the outliers, enhancing the reconstruction quality. The improved 3D face reconstruction, in turn, enables the segmentation network to better predict the outliers. To resolve the ambiguity between outliers and regions that are difficult to fit, such as eyebrows, we build a statistical prior from synthetic data that measures the systematic bias in model fitting. Experiments on the NoW testset demonstrate that FOCUS achieves SOTA 3D face reconstruction performance among all baselines that are trained without 3D annotation. Moreover, our results on CelebA-HQ and the AR database show that the segmentation network can localize occluders accurately despite being trained without any segmentation annotation.
翻译:基于模型的人脸鲁棒重建:通过弱监督异常分割
在这项工作中,我们旨在通过避免将模型拟合到异常值(如遮挡物或化妆品等无法得到很好表达的区域)来增强基于模型的人脸重建。本地化异常值的核心挑战在于它们变化无常,难以标注。为了解决这个具有挑战性的问题,我们引入了一种联合人脸自编码器和异常值分割方法(FOCUS)。特别地,我们利用异常值无法被人脸模型很好地拟合这一事实,因此可以在给出高质量模型拟合的情况下很好地本地化异常值。主要的挑战在于模型拟合和异常值分割彼此相互依赖,需要一起推断。我们采用一种 EM 类型的训练策略来解决这个“先有鸡还是先有蛋”的问题,其中人脸自编码器和异常值分割网络一起训练。这导致了一种协同作用,其中分割网络防止人脸编码器拟合异常值,从而提高了重建质量。反过来,改进的 3D 人脸重建又使分割网络能够更好地预测异常值。为了解决眉毛等难以拟合的区域和异常值之间的歧义,我们从合成数据中构建了一个统计先验,用于测量模型拟合的系统性误差。在 NoW 测试集上的实验表明,FOCUS 在没有 3D 注释的情况下,在所有基线模型中实现了 SOTA 3D 人脸重建性能。此外,我们在 CelebA-HQ 和 AR 数据库上的结果表明,尽管没有任何分割注释,分割网络仍然能够准确地定位遮挡物。