Compared with model architectures, the training process, which is also crucial to the success of detectors, has received relatively less attention in object detection. In this work, we carefully revisit the standard training practice of detectors, and find that the detection performance is often limited by the imbalance during the training process, which generally consists in three levels - sample level, feature level, and objective level. To mitigate the adverse effects caused thereby, we propose Libra R-CNN, a simple but effective framework towards balanced learning for object detection. It integrates three novel components: IoU-balanced sampling, balanced feature pyramid, and balanced L1 loss, respectively for reducing the imbalance at sample, feature, and objective level. Benefitted from the overall balanced design, Libra R-CNN significantly improves the detection performance. Without bells and whistles, it achieves 2.5 points and 2.0 points higher Average Precision (AP) than FPN Faster R-CNN and RetinaNet respectively on MSCOCO.
翻译:与模型结构相比,对探测器成功也至关重要的培训过程在物体探测方面受到的关注相对较少,在这项工作中,我们仔细重新审视探测器的标准培训做法,发现探测性能往往受到培训过程不平衡的限制,培训过程一般分为三个层次:抽样水平、特征水平和客观水平;为减轻由此造成的有害影响,我们提议利布拉 R-CNN,这是实现物体探测平衡学习的一个简单而有效的框架;它包括三个新颖的组成部分:IoU平衡抽样、平衡的地貌金字塔和平衡的L1损失,分别用于减少抽样、特征和客观水平的不平衡;从总体平衡设计中受益的Libra R-CNN显著改进了探测性能;没有钟和哨子,它平均精度分别达到2.5分和2.0分高于FPN更快的R-CNN和RetinaNet。