Model efficiency has become increasingly important in computer vision. In this paper, we systematically study various neural network architecture design choices for object detection and propose several key optimizations to improve efficiency. First, we propose a weighted bi-directional feature pyramid network (BiFPN), which allows easy and fast multi-scale feature fusion; Second, we propose a compound scaling method that uniformly scales the resolution, depth, and width for all backbone, feature network, and box/class prediction networks at the same time. Based on these optimizations, we have developed a new family of object detectors, called EfficientDet, which consistently achieve an order-of-magnitude better efficiency than prior art across a wide spectrum of resource constraints. In particular, without bells and whistles, our EfficientDet-D7 achieves stateof-the-art 51.0 mAP on COCO dataset with 52M parameters and 326B FLOPS1 , being 4x smaller and using 9.3x fewer FLOPS yet still more accurate (+0.3% mAP) than the best previous detector.
翻译:模型效率在计算机愿景中变得日益重要。 在本文中,我们系统地研究各种神经网络结构设计选择,以探测物体,并提出若干关键的优化,以提高效率。首先,我们提议一个加权双向地貌金字塔网络(BiFPN),允许简单和快速的多尺度地段融合;第二,我们提议一种复合规模化方法,以统一所有主干、地物网络和箱/舱级预测网络的分辨率、深度和宽度,同时对所有主干、地物网络和箱/舱级预测网络进行比例衡量。根据这些优化,我们开发了一套新的物体探测器,称为“高效Det”,这些探测器在广泛的资源限制方面始终比以往的艺术更高效,特别是没有钟声和哨声,我们的高效D7实现了具有52M参数和326B FLOPS1的CO数据集的51.0 mAP状态,比以前的最佳探测器小4x少,使用9.3x的FLOPS还更精确(+0.3% mAP)。