We propose real-time, six degrees of freedom (6DoF), 3D face pose estimation without face detection or landmark localization. We observe that estimating the 6DoF rigid transformation of a face is a simpler problem than facial landmark detection, often used for 3D face alignment. In addition, 6DoF offers more information than face bounding box labels. We leverage these observations to make multiple contributions: (a) We describe an easily trained, efficient, Faster R-CNN--based model which regresses 6DoF pose for all faces in the photo, without preliminary face detection. (b) We explain how pose is converted and kept consistent between the input photo and arbitrary crops created while training and evaluating our model. (c) Finally, we show how face poses can replace detection bounding box training labels. Tests on AFLW2000-3D and BIWI show that our method runs at real-time and outperforms state of the art (SotA) face pose estimators. Remarkably, our method also surpasses SotA models of comparable complexity on the WIDER FACE detection benchmark, despite not been optimized on bounding box labels.
翻译:我们提出实时、6度自由(6DoF),3D面部显示的是实时、6度自由(6DoF),6D面部显示是未经面部检测或里程碑式定位的估算。我们观察到,估计6DoF面部的刻板变形比3D面部对比时经常使用的面部标志检测更简单。此外,6DoF提供的更多信息多于面部捆绑框标签。我们利用这些观察来做出多种贡献:(a) 我们描述一个简单、高效、快速R-CNN的模型,这种模型在6DoF反射在照片中代表所有面部,而没有初步面部检测。 (b) 我们解释在培训和评估模型时,如何转换和保持输入照片和任意作物之间的结构。 (c) 最后,我们展示面部能够取代检测的捆绑框训练标签。对AFLFW2000-3D和BIWI的测试表明,我们的方法在实时和超出艺术(SotA)面部状态时是估计的。值得注意的是,我们的方法也超过了WIDERFACE检测基准上类似复杂度的索型模型,尽管没有优化。