Space-based infrared tiny ship detection aims at separating tiny ships from the images captured by earth orbiting satellites. Due to the extremely large image coverage area (e.g., thousands square kilometers), candidate targets in these images are much smaller, dimer, more changeable than those targets observed by aerial-based and land-based imaging devices. Existing short imaging distance-based infrared datasets and target detection methods cannot be well adopted to the space-based surveillance task. To address these problems, we develop a space-based infrared tiny ship detection dataset (namely, NUDT-SIRST-Sea) with 48 space-based infrared images and 17598 pixel-level tiny ship annotations. Each image covers about 10000 square kilometers of area with 10000X10000 pixels. Considering the extreme characteristics (e.g., small, dim, changeable) of those tiny ships in such challenging scenes, we propose a multi-level TransUNet (MTU-Net) in this paper. Specifically, we design a Vision Transformer (ViT) Convolutional Neural Network (CNN) hybrid encoder to extract multi-level features. Local feature maps are first extracted by several convolution layers and then fed into the multi-level feature extraction module (MVTM) to capture long-distance dependency. We further propose a copy-rotate-resize-paste (CRRP) data augmentation approach to accelerate the training phase, which effectively alleviates the issue of sample imbalance between targets and background. Besides, we design a FocalIoU loss to achieve both target localization and shape description. Experimental results on the NUDT-SIRST-Sea dataset show that our MTU-Net outperforms traditional and existing deep learning based SIRST methods in terms of probability of detection, false alarm rate and intersection over union.
翻译:红外线小船探测旨在将小船与地球轨道卫星捕获的图像隔离开来,由于图像覆盖面积极大(例如千平方公里),这些图像中的候选目标比空中和陆基成像装置所观测的目标小得多、暗淡、比天基成像装置所观测的目标要大得多。现有短成像红外红外线数据集和目标探测方法不能很好地用于天基监视任务。为了解决这些问题,我们开发了一个基于空间的红外线小型船舶探测数据集(即NUDT-SIRST-Sea),有48个天基红外图像覆盖区和17598个像素级小型船舶说明。每个图像覆盖了大约10 000平方公里的面积,比空中成像仪和陆基成像成像装置所观察到的10 000X1000像素。考虑到这些小船在如此具有挑战性的场景中的极端特性(例如小、亮度、可变形、可变异的红外线(MTU-Net)在本文上设计了一个基于视野的变异(VTL)网络(VNNL)的变异化背景图解),然后通过一些变动的内变变变变变变变变变变的内基数据,我们变变变变的变变变变变变变变变变的变变变变变的内基数据系统(我们等的内基)的变变变变变变变变的内,然后用S-S-SDGLS-S-S-S-SDFL的模型的模型的模型的模型模型的模型的模型的模型的模型的模型的模型, 的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型, 的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型,将的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的