The past few years have seen an increased interest in aerial image object detection due to its critical value to large-scale geo-scientific research like environmental studies, urban planning, and intelligence monitoring. However, the task is very challenging due to the birds-eye view perspective, complex backgrounds, large and various image sizes, different appearances of objects, and the scarcity of well-annotated datasets. Recent advances in computer vision have shown promise tackling the challenge. Specifically, Vision Transformer Detector (ViTDet) was proposed to extract multi-scale features for object detection. The empirical study shows that ViTDet's simple design achieves good performance on natural scene images and can be easily embedded into any detector architecture. To date, ViTDet's potential benefit to challenging aerial image object detection has not been explored. Therefore, in our study, 25 experiments were carried out to evaluate the effectiveness of ViTDet for aerial image object detection on three well-known datasets: Airbus Aircraft, RarePlanes, and Dataset of Object DeTection in Aerial images (DOTA). Our results show that ViTDet can consistently outperform its convolutional neural network counterparts on horizontal bounding box (HBB) object detection by a large margin (up to 17% on average precision) and that it achieves the competitive performance for oriented bounding box (OBB) object detection. Our results also establish a baseline for future research.
翻译:过去几年中,由于对大型地球科学研究,如环境研究、城市规划和情报监测等至关重要的价值,对航空图像天体探测的兴趣增加了,因为对大型地球科学研究,如环境研究、城市规划和情报监测具有至关重要的价值。然而,由于鸟眼视角、复杂背景、大型和各种图像大小、物体不同外观以及缺少附加说明的数据集,这项任务非常具有挑战性。近年来在计算机视野方面的最新进展显示,有望应对这一挑战。具体地说,ViView变异器探测器(ViTDet)建议为天体探测提取多种尺度的特征。实验研究表明,ViTDet的简单设计在自然场景图像上取得了良好的性能,可以很容易地嵌入任何探测器结构。迄今为止,ViTDet对挑战空中图像天体探测的潜在好处尚未探索出来。因此,在我们的研究中,进行了25项实验,以评价ViTDet在三个众所周知的数据集(Airbus Airboard,RarePlanes,以及天体探测目标数据集(DODATA)中,我们的成果显示,ViTDet的简单设计可以持续地心基级探测结果,从而建立我们17级测试的横向基准对等的横向测试。我们实验室测测测距网络的轨道。我们测测测距的轨道)在17级的轨道上的轨测测测测测目标(Vibbbbbbbbbbbbbbbbbbbbbbbbb)的轨结果。