The low resolution of objects of interest in aerial images makes pedestrian detection and action detection extremely challenging tasks. Furthermore, using deep convolutional neural networks to process large images can be demanding in terms of computational requirements. In order to alleviate these challenges, we propose a two-step, yes and no question answering framework to find specific individuals doing one or multiple specific actions in aerial images. First, a deep object detector, Single Shot Multibox Detector (SSD), is used to generate object proposals from small aerial images. Second, another deep network, is used to learn a latent common sub-space which associates the high resolution aerial imagery and the pedestrian action labels that are provided by the human-based sources
翻译:航空图像中感兴趣的物体的低分辨率,使得行人探测和行动探测发现发现行人和动作探测的物体的分辨率低,非常具有挑战性的任务。此外,利用深演神经网络处理大型图像在计算要求方面可能要求很高。为了减轻这些挑战,我们提议一个两步、是和毫无疑问的回答框架,以寻找在航空图像中进行一种或多种具体行动的具体个人。首先,一个深物体探测器,即单射多箱探测器(SSSD),用来从小型航空图像中产生物体建议。第二,另一个深网络,用来学习与高分辨率航空图像和人源提供的行人动作标签相联系的潜在共同次空间。