The key challenge in multiple-object tracking task is temporal modeling of the object under track. Existing tracking-by-detection methods adopt simple heuristics, such as spatial or appearance similarity. Such methods, in spite of their commonality, are overly simple and lack the ability to learn temporal variations from data in an end-to-end manner. In this paper, we present MOTR, a fully end-to-end multiple-object tracking framework. It learns to model the long-range temporal variation of the objects. It performs temporal association implicitly and avoids previous explicit heuristics. Built upon DETR, MOTR introduces the concept of "track query". Each track query models the entire track of an object. It is transferred and updated frame-by-frame to perform iterative predictions in a seamless manner. Tracklet-aware label assignment is proposed for one-to-one assignment between track queries and object tracks. Temporal aggregation network together with collective average loss is further proposed to enhance the long-range temporal relation. Experimental results show that MOTR achieves competitive performance and can serve as a strong Transformer-based baseline for future research. Code is available at \url{https://github.com/megvii-model/MOTR}.
翻译:多弹道跟踪任务的关键挑战是对轨道下物体进行时间模型化。现有的逐个跟踪方法采用简单的超光速学,例如空间或外观相似性。这些方法尽管具有共同性,但过于简单,没有能力从数据中学习时间变异。在本文中,我们介绍了完全端到端多弹道跟踪框架MOTR,这是一个完全端到端的多弹体跟踪框架。它学习模拟物体的长距离时间变异。它隐含着时间关联,避免了先前明显的超音率。在 DETTR 上建构,MOTR 引入了“轨道查询”的概念。每个轨道查询模型都是一个对象的整个轨道。它被转移和更新了框架,无法以顺畅的方式进行迭代预测。我们提议对轨道查询和天体轨道之间的一对一任务进行跟踪识别标签分配。还进一步提议建立温度汇总网络和集体平均损失,以加强远程时间关系。实验结果显示MOTR能够实现竞争性的性能,并且可以作为未来研究的强有力的TR-MOIM/MARMR的基线。可提供代码。