Multi-object tracking (MOT) is a crucial component of situational awareness in military defense applications. With the growing use of unmanned aerial systems (UASs), MOT methods for aerial surveillance is in high demand. Application of MOT in UAS presents specific challenges such as moving sensor, changing zoom levels, dynamic background, illumination changes, obscurations and small objects. In this work, we present a robust object tracking architecture aimed to accommodate for the noise in real-time situations. We propose a kinematic prediction model, called Deep Extended Kalman Filter (DeepEKF), in which a sequence-to-sequence architecture is used to predict entity trajectories in latent space. DeepEKF utilizes a learned image embedding along with an attention mechanism trained to weight the importance of areas in an image to predict future states. For the visual scoring, we experiment with different similarity measures to calculate distance based on entity appearances, including a convolutional neural network (CNN) encoder, pre-trained using Siamese networks. In initial evaluation experiments, we show that our method, combining scoring structure of the kinematic and visual models within a MHT framework, has improved performance especially in edge cases where entity motion is unpredictable, or the data presents frames with significant gaps.
翻译:多球跟踪(MOT)是军事防御应用中局势意识的一个关键组成部分。随着无人驾驶航空系统(UAS)的使用日益增多,MOT空中监视方法的需求很大。在UAS中应用MOT带来了具体的挑战,例如移动传感器、变化的缩放水平、动态背景、照明变化、隐蔽和小物体。在这项工作中,我们提出了一个强大的物体跟踪结构,旨在适应实时情况下的噪音。我们提出了一个动态预测模型,称为深扩展卡尔曼过滤器(DiefEKF),其中使用序列到序列结构来预测潜在空间的实体轨迹。DeepEKF利用一个经过训练的图像嵌入式和关注机制来权衡图像对预测未来状态的重要性。在视觉评分方面,我们试验了不同的类似措施,以根据实体外观来计算距离,包括一个革命神经网络(CNN),利用Siamese网络进行预先培训。在初步评估实验中,我们展示了一种方法,即我们的方法是将排序结构与一个特别不可预测的模型和视觉框架内的图像优势相结合。