The speed-accuracy Pareto curve of object detection systems have advanced through a combination of better model architectures, training and inference methods. In this paper, we methodically evaluate a variety of these techniques to understand where most of the improvements in modern detection systems come from. We benchmark these improvements on the vanilla ResNet-FPN backbone with RetinaNet and RCNN detectors. The vanilla detectors are improved by 7.7% in accuracy while being 30% faster in speed. We further provide simple scaling strategies to generate family of models that form two Pareto curves, named RetinaNet-RS and Cascade RCNN-RS. These simple rescaled detectors explore the speed-accuracy trade-off between the one-stage RetinaNet detectors and two-stage RCNN detectors. Our largest Cascade RCNN-RS models achieve 52.9% AP with a ResNet152-FPN backbone and 53.6% with a SpineNet143L backbone. Finally, we show the ResNet architecture, with three minor architectural changes, outperforms EfficientNet as the backbone for object detection and instance segmentation systems.
翻译:通过更好的模型结构、培训和推断方法的结合,物体探测系统速度-准确性Pareto曲线取得了进展。在本文件中,我们系统地评估了各种这些技术,以了解现代探测系统的大部分改进来自何方。我们将这些改进以Vanilla ResNet-FPN主干线和RetinaNet和RCNNN探测器作为基准。香草探测器的精确度提高了7.7%,而速度加快了30%。我们还提供了简单的缩放战略,以生成形成两个Pareto曲线的模型系列,称为RetinaNet-RS和Cascade RCNN-RS。这些简单的重新缩放探测器探索了一级RetinaNet探测器和两阶段RCNNN探测器之间的速度-准确性交换。我们最大的Cascade RCNNN-RS模型实现了52.9%的AP,使用了ResNet152-FN骨架,53.6%使用了SponeNet143L主干线。最后,我们展示了ResNet结构结构结构,三个小变化,超越了网络作为对象探测和断段系统的骨架。