Multiple-object tracking (MOT) is a challenging task that requires simultaneous reasoning about location, appearance, and identity of the objects in the scene over time. Our aim in this paper is to move beyond tracking-by-detection approaches, that perform well on datasets where the object classes are known, to class-agnostic tracking that performs well also for unknown object classes.To this end, we make the following three contributions: first, we introduce {\em semantic detector queries} that enable an object to be localized by specifying its approximate position, or its appearance, or both; second, we use these queries within an auto-regressive framework for tracking, and propose a multi-query tracking transformer (\textit{MQT}) model for simultaneous tracking and appearance-based re-identification (reID) based on the transformer architecture with deformable attention. This formulation allows the tracker to operate in a class-agnostic manner, and the model can be trained end-to-end; finally, we demonstrate that \textit{MQT} performs competitively on standard MOT benchmarks, outperforms all baselines on generalised-MOT, and generalises well to a much harder tracking problems such as tracking any object on the TAO dataset.
翻译:多目标跟踪(MOT) 是一项艰巨的任务, 需要同时推理现场物体的位置、 外观和特征。 我们本文的目的是要超越在已知对象类别所在的数据集上运行良好、 在对象类别已知的数据集上运行良好、 级不可知跟踪也运行良好, 进而在变异的物体类别上运行。 为此, 我们做出以下三项贡献 : 首先, 我们引入 ~ 语义检测问询 }, 通过指定其近似位置、 外观或两者, 使对象本地化; 第二, 我们使用自动递增框架中的这些查询进行跟踪, 并提议一个多查询跟踪变异器(\ textit{ MQT} ) 模型, 用于同步跟踪和基于外观的重新定位, 以变异的注意为基础。 此配方可以让跟踪器以等级- 方式运行, 并且该模型可以经过最终培训; 最后, 我们演示\ Textitit{MQT} 在标准的 递增跟踪基准上竞争地运行, 以更难的轨道 。