Face detection is a fundamental problem for many downstream face applications, and there is a rising demand for faster, more accurate yet support for higher resolution face detectors. Recent smartphones can record a video in 8K resolution, but many of the existing face detectors still fail due to the anchor size and training data. We analyze the failure cases and observe a large number of correct predicted boxes with incorrect confidences. To calibrate these confidences, we propose a confidence ranking network with a pairwise ranking loss to re-rank the predicted confidences locally within the same image. Our confidence ranker is model-agnostic, so we can augment the data by choosing the pairs from multiple face detectors during the training, and generalize to a wide range of face detectors during the testing. On WiderFace, we achieve the highest AP on the single-scale, and our AP is competitive with the previous multi-scale methods while being significantly faster. On 8K resolution, our method solves the GPU memory issue and allows us to indirectly train on 8K. We collect 8K resolution test set to show the improvement, and we will release our test set as a new benchmark for future research.
翻译:对于许多下游的面部应用来说,面部检测是一个根本性问题,而且对更快、更准确但支持更高分辨率面部探测器的需求正在增加。 最近的智能手机可以在8K分辨率中录制视频, 但现有的许多面部检测器仍然由于锁定大小和培训数据而失败。 我们分析失败案例并用不正确的信任度观察大量正确的预测框。 为了校准这些信任度, 我们建议建立一个信任等级评分网络, 配对排序损失, 在同一图像中重新排位。 我们的信任等级是模型的, 所以我们可以通过在培训中从多个面部检测器中选择对子来增加数据, 从而在测试中将数据推广到广泛的面部检测器。 在大法中, 我们在单级上实现了最高的AP, 我们的AP在与先前的多尺度方法竞争得很快。 在 8K 解析中, 我们的方法解决了 GPU 记忆问题, 并允许我们间接地在 8K 上进行培训。 我们收集了 8K 分辨率测试集显示改进情况, 我们将公布测试设置新的基准, 作为未来研究的新基准。