We present FoveaBox, an accurate, flexible and completely anchor-free framework for object detection. While almost all state-of-the-art object detectors utilize the predefined anchors to enumerate possible locations, scales and aspect ratios for the search of the objects, their performance and generalization ability are also limited to the design of anchors. Instead, FoveaBox directly learns the object existing possibility and the bounding box coordinates without anchor reference. This is achieved by: (a) predicting category-sensitive semantic maps for the object existing possibility, and (b) producing category-agnostic bounding box for each position that potentially contains an object. The scales of target boxes are naturally associated with feature pyramid representations for each input image. Without bells and whistles, FoveaBox achieves state-of-the-art single model performance of 42.1 AP on the standard COCO detection benchmark. Specially for the objects with arbitrary aspect ratios, FoveaBox brings in significant improvement compared to the anchor-based detectors. More surprisingly, when it is challenged by the stretched testing images, FoveaBox shows great robustness and generalization ability to the changed distribution of bounding box shapes. The code will be made publicly available.
翻译:虽然几乎所有最先进的物体探测器都利用预设的锚来列出搜索对象的可能位置、规模和方位比率,但其性能和概括性能力也仅限于锁定设计。相反,FoveaBox直接了解物体的现有可能性和不参考锚的捆绑框坐标,具体方法是:(a) 预测对象现有可能性的对分类敏感的语义图,以及(b) 为每个可能含有物体的位置制作分类式信标框。目标箱的尺度自然与每个输入图像的金字塔特征表相联。没有钟和哨,FoveaBox在标准的COCOCO检测基准中实现了42.1 AP的单一模型状态性能。对于任意的方位比率对象,FoveaBox与基于锚的探测器相比有了显著的改进。更令人惊讶的是,FoveBox在受到拉长的测试图像的挑战时,将展示出巨大的稳健的分布和普通化能力。