In this paper, we propose an accurate yet fast small object detection method for RSI, named SuperYOLO, which fuses multimodal data and performs high resolution (HR) object detection on multiscale objects by utilizing the assisted super resolution (SR) learning and considering both the detection accuracy and computation cost. First, we construct a compact baseline by removing the Focus module to keep the HR features and significantly overcomes the missing error of small objects. Second, we utilize pixel-level multimodal fusion (MF) to extract information from various data to facilitate more suitable and effective features for small objects in RSI. Furthermore, we design a simple and flexible SR branch to learn HR feature representations that can discriminate small objects from vast backgrounds with low-resolution (LR) input, thus further improving the detection accuracy. Moreover, to avoid introducing additional computation, the SR branch is discarded in the inference stage and the computation of the network model is reduced due to the LR input. Experimental results show that, on the widely used VEDAI RS dataset, SuperYOLO achieves an accuracy of 73.61% (in terms of mAP50), which is more than 10% higher than the SOTA large models such as YOLOv5l, YOLOv5x and RS designed YOLOrs. Meanwhile, the GFOLPs and parameter size of SuperYOLO are about 18.1x and 4.2x less than YOLOv5x. Our proposed model shows a favorable accuracy-speed trade-off compared to the state-of-art models. The code will be open sourced at https://github.com/icey-zhang/SuperYOLO.
翻译:在本文中,我们为RSI提出了一个精确而快速的小物体探测方法,名为 SuperYOLO,它通过利用辅助超分辨率学习,并考虑探测准确性和计算成本,在多尺度物体上结合多式联运数据并进行高分辨率(HR)物体探测。首先,我们通过删除Focus模块,构建一个压缩基线,以保持HR特性,并大大克服小物体缺失的错误。第二,我们利用像素级多式联运聚合(MF)从各种数据中提取信息,以便利为RSI中小物体更合适和有效的特性。此外,我们设计了一个简单灵活的SR分支,以学习具有低分辨率(LR)输入广泛背景的小物体的HR特征显示,从而能够区分大背景的小物体,从而进一步提高探测的准确性。此外,为避免引入额外的计算,SR分支被丢弃在推论阶段,网络模型的计算因LRLREF投入而减少。实验结果显示,在广泛使用的VEDAI RSO 公开数据设置上,超级YOL 的代码将达到73.61%(按 mAP50的术语)的精确度,这可以比YOL/ROVOO值高得多,而显示YOL模型的数值为18OO值更高值为高,而设计为VOL值为高。