This paper analyses the design choices of face detection architecture that improve efficiency between computation cost and accuracy. Specifically, we re-examine the effectiveness of the standard convolutional block as a lightweight backbone architecture on face detection. Unlike the current tendency of lightweight architecture design, which heavily utilizes depthwise separable convolution layers, we show that heavily channel-pruned standard convolution layer can achieve better accuracy and inference speed when using a similar parameter size. This observation is supported by the analyses concerning the characteristics of the target data domain, face. Based on our observation, we propose to employ ResNet with a highly reduced channel, which surprisingly allows high efficiency compared to other mobile-friendly networks (e.g., MobileNet-V1,-V2,-V3). From the extensive experiments, we show that the proposed backbone can replace that of the state-of-the-art face detector with a faster inference speed. Also, we further propose a new feature aggregation method maximizing the detection performance. Our proposed detector EResFD obtained 80.4% mAP on WIDER FACE Hard subset which only takes 37.7 ms for VGA image inference in on CPU. Code will be available at https://github.com/clovaai/EResFD.
翻译:本文分析了提高计算成本和精确度之间效率的面部探测结构的设计选择。 具体地说, 我们重新审视标准变速器块作为面部探测的轻量级主干结构结构的效能。 与目前大量利用深度分解相分离的变速层的轻量型结构设计趋势不同, 我们显示, 使用类似的参数大小, 高频道操纵的标准变速层在使用类似的参数尺寸时可以取得更好的准确性和推断速度。 这一观察得到关于目标数据域特性的分析的支持。 根据我们的观察, 我们提议用一个高度缩小的频道来使用ResNet, 使标准变速器与其它移动友好网络( 例如, MobileNet-V1,-V2,-V3) 相比具有很高的效率。 从广泛的实验来看, 我们显示, 拟议的主干线可以以更快的推力速度取代状态的面部探测器。 我们还提议一种新的特征汇总方法, 最大限度地发挥探测性能。 我们提议的探测器EResFDD获得80. MAAP在WIDER AME Oard子组上获得80.4 mAP, 这只需要37.7 ms/CLEFA/ CUPOLOS。 在MAGA/CS/CSUPOLOS中, 。