The visual inspection of aerial drone footage is an integral part of land search and rescue (SAR) operations today. Since this inspection is a slow, tedious and error-prone job for humans, we propose a novel deep learning algorithm to automate this aerial person detection (APD) task. We experiment with model architecture selection, online data augmentation, transfer learning, image tiling and several other techniques to improve the test performance of our method. We present the novel Aerial Inspection RetinaNet (AIR) algorithm as the combination of these contributions. The AIR detector demonstrates state-of-the-art performance on a commonly used SAR test data set in terms of both precision (~21 percentage point increase) and speed. In addition, we provide a new formal definition for the APD problem in SAR missions. That is, we propose a novel evaluation scheme that ranks detectors in terms of real-world SAR localization requirements. Finally, we propose a novel postprocessing method for robust, approximate object localization: the merging of overlapping bounding boxes (MOB) algorithm. This final processing stage used in the AIR detector significantly improves its performance and usability in the face of real-world aerial SAR missions.
翻译:对空中无人机镜头的视觉检查是今天陆地搜索和救援(SAR)行动的一个组成部分。由于这种检查对人类来说是一个缓慢、乏味和容易出错的工作,我们建议采用新的深层次学习算法,使航空人员探测(APD)任务自动化。我们试验了模型结构选择、在线数据增强、传输学习、图像打字和若干其他技术,以改进我们方法的测试性能。我们将这些贡献结合起来,提出了新的Air Air Insurvement RetinaNet算法。AIR探测器用精确度(增加~21个百分点)和速度来展示一套常用的合成孔径雷达测试数据的最新性能。此外,我们为ARSAR飞行任务中的APD问题提供了一个新的正式定义。这就是,我们提议了一个新的评价计划,按照真实世界搜索系统本地化要求将探测器排在一起。最后,我们提出了一种强力、近似对象定位的后处理方法:合并重叠的装箱(MOB)算法。这个最后处理阶段用于AIR探测器,大大改进了它在现实空间搜索飞行任务上的性能。