This work presents a deep learning approach for vehicle detection in satellite video. Vehicle detection is perhaps impossible in single EO satellite images due to the tininess of vehicles (4-10 pixel) and their similarity to the background. Instead, we consider satellite video which overcomes the lack of spatial information by temporal consistency of vehicle movement. A new spatiotemporal model of a compact $3 \times 3$ convolutional, neural network is proposed which neglects pooling layers and uses leaky ReLUs. Then we use a reformulation of the output heatmap including Non-Maximum-Suppression (NMS) for the final segmentation. Empirical results on two new annotated satellite videos reconfirm the applicability of this approach for vehicle detection. They more importantly indicate that pre-training on WAMI data and then fine-tuning on few annotated video frames for a new video is sufficient. In our experiment only five annotated images yield a $F_1$ score of 0.81 on a new video showing more complex traffic patterns than the Las Vegas video. Our best result on Las Vegas is a $F_1$ score of 0.87 which makes the proposed approach a leading method for this benchmark.
翻译:这项工作为在卫星录像中探测车辆提供了一种深层次的学习方法。由于车辆的微小(4-10像素)及其与背景的相似性,在单一的EO卫星图像中,车辆的探测也许是不可能的。相反,我们考虑卫星视频,通过车辆移动的时间一致性克服空间信息的缺乏。一个3美元乘数3美元的紧凑式神经网络的新空间时尚模型,该模型忽视了集合层和使用泄漏的ReLU。然后,我们用包括非Meximum-Supression(NMS)在内的输出热图重新制作了最后的分层。两个新的附加说明的卫星视频的经验性结果再次确认了这一方法对车辆探测的适用性。更重要的是,它们表明对WAMI数据进行预先培训,然后对几个附加说明的视频框架进行微调就足够了。在我们实验中,只有五个附加说明的图像在显示比拉斯维加斯视频更复杂的交通模式的新视频上得出0.81美元的F_1分。我们在拉斯维加斯取得的最佳结果是0.87分的0.87分,从而提出了一种方法。