While generic object detection has achieved large improvements with rich feature hierarchies from deep nets, detecting small objects with poor visual cues remains challenging. Motion cues from multiple frames may be more informative for detecting such hard-to-distinguish objects in each frame. However, how to encode discriminative motion patterns, such as deformations and pose changes that characterize objects, has remained an open question. To learn them and thereby realize small object detection, we present a neural model called the Recurrent Correlational Network, where detection and tracking are jointly performed over a multi-frame representation learned through a single, trainable, and end-to-end network. A convolutional long short-term memory network is utilized for learning informative appearance change for detection, while learned representation is shared in tracking for enhancing its performance. In experiments with datasets containing images of scenes with small flying objects, such as birds and unmanned aerial vehicles, the proposed method yielded consistent improvements in detection performance over deep single-frame detectors and existing motion-based detectors. Furthermore, our network performs as well as state-of-the-art generic object trackers when it was evaluated as a tracker on the bird dataset.
 翻译:虽然普通物体探测在深网的丰富特征分级方面取得了很大改进,但探测视觉信号差的小物体仍具有挑战性。多框架的提示对于在每个框架中探测这类难以辨别的物体可能更具有信息意义。然而,如何将偏向运动模式,例如变形和改变物体特征的图案编码成一个未决问题。为了了解这些图案并从而实现小物体探测,我们提出了一个神经模型,称为经常关联网络,通过通过一个单一、可培训和端至端网络学习的多框架表示,共同进行探测和跟踪。利用一个长期的短期内存网络来学习用于探测的信息外观变化,同时在跟踪其性能方面分享学习到的介绍。在对包含小物体图像的数据集进行实验时,如鸟类和无人驾驶飞行器,拟议的方法在对深海单机探测器和现有运动探测器的探测性能方面产生了一致的改进。此外,我们的网络在被评估为跟踪器时,也表现了最先进的普通物体追踪器。