This paper summarizes model improvements and inference-time optimizations for the popular anchor-based detectors in the scenes of autonomous driving. Based on the high-performing RCNN-RS and RetinaNet-RS detection frameworks designed for common detection scenes, we study a set of framework improvements to adapt the detectors to better detect small objects in crowd scenes. Then, we propose a model scaling strategy by scaling input resolution and model size to achieve a better speed-accuracy trade-off curve. We evaluate our family of models on the real-time 2D detection track of the Waymo Open Dataset (WOD). Within the 70 ms/frame latency constraint on a V100 GPU, our largest Cascade RCNN-RS model achieves 76.9% AP/L1 and 70.1% AP/L2, attaining the new state-of-the-art on WOD real-time 2D detection. Our fastest RetinaNet-RS model achieves 6.3 ms/frame while maintaining a reasonable detection precision at 50.7% AP/L1 and 42.9% AP/L2.
翻译:本文总结了自主驾驶场景流行锚基探测器的模型改进和推断时间优化情况。根据为共同探测场景设计的高性能RCNN-RS和RetinNet-RS探测框架,我们研究了一套框架改进办法,使探测器适应在人群场景中更好地探测小物体的情况。然后,我们提出了一个示范规模战略,通过扩大投入分辨率和模型大小,实现更快速准确交易曲线,从而实现更好的速度精确度。我们评估了Waymo开放数据集(Wawaymo OpenD)实时2D探测轨模型的系列模型。在V100 GPU70 ms/框架延缓度限制范围内,我们最大的CascadeRCNNN-RS模型实现了76.9% AP/L1和70.1% AP/L2,实现了WOD实时2探测的新状态。我们最快的Retinnet-RS模型达到6.3 ms/框架,同时保持50.7% AP/L1和42.9% AP/L2的合理探测精确度。