Multi-target tracking (MTT) is a traditional signal processing task, where the goal is to estimate the states of an unknown number of moving targets from noisy sensor measurements. In this paper, we revisit MTT from a deep learning perspective and propose convolutional neural network (CNN) architectures to tackle it. We represent the target states and sensor measurements as images. Thereby we recast the problem as a image-to-image prediction task for which we train a fully convolutional model. This architecture is motivated by a novel theoretical bound on the transferability error of CNN. The proposed CNN architecture outperforms a GM-PHD filter on the MTT task with 10 targets. The CNN performance transfers without re-training to a larger MTT task with 250 targets with only a $13\%$ increase in average OSPA.
翻译:多目标跟踪(MTT)是一项传统的信号处理任务,目的是估计来自噪音传感器测量的移动目标数目不详的状态。在本文中,我们从深层学习的角度重新审视MTT,并提出应对它的办法;我们将目标状态和感测测量作为图像来代表目标状态和感测测量;因此,我们将此问题重新描述为一个图像到图像的预测任务,为此我们训练了一个完全进化的模型。这一结构的动机是,对CNN的可传输性错误进行新颖的理论约束。拟议的CNN结构比MTT任务上10个目标的GM-PHD过滤器要多。CNN的性能转移没有再培训,而是将250个目标提升到更大的MTT任务上,平均OSPA只增加13美元。