Transformers have proven superior performance for a wide variety of tasks since they were introduced. In recent years, they have drawn attention from the vision community in tasks such as image classification and object detection. Despite this wave, an accurate and efficient multiple-object tracking (MOT) method based on transformers is yet to be designed. We argue that the direct application of a transformer architecture with quadratic complexity and insufficient noise-initialized sparse queries - is not optimal for MOT. We propose TransCenter, a transformer-based MOT architecture with dense representations for accurately tracking all the objects while keeping a reasonable runtime. Methodologically, we propose the use of image-related dense detection queries and efficient sparse tracking queries produced by our carefully designed query learning networks (QLN). On one hand, the dense image-related detection queries allow us to infer targets' locations globally and robustly through dense heatmap outputs. On the other hand, the set of sparse tracking queries efficiently interacts with image features in our TransCenter Decoder to associate object positions through time. As a result, TransCenter exhibits remarkable performance improvements and outperforms by a large margin the current state-of-the-art methods in two standard MOT benchmarks with two tracking settings (public/private). TransCenter is also proven efficient and accurate by an extensive ablation study and comparisons to more naive alternatives and concurrent works. For scientific interest, the code is made publicly available at https://github.com/yihongxu/transcenter.
翻译:自引入以来,变异器已经证明在各种各样的任务中表现优异。近年来,他们已经提请视觉界注意图像分类和物体探测等任务。尽管有这一波,但基于变异器的准确而高效的多球跟踪(MOT)方法尚有待设计。我们争辩说,直接应用具有二次复杂度和噪音初始性查询不足的变异器结构对MOT来说并不理想。我们提议了基于变异器的MOT结构,即基于变异器的MOT结构,在保持合理运行时间的同时,精确跟踪所有物体。从方法上,我们建议使用与图像有关的密集检测查询和由我们精心设计的查询网络(QLN)生成的高效零星跟踪查询。一方面,密集的与图像有关的检测查询使我们能够通过密集的热映输出在全球范围内和稳健地推断目标的位置。另一方面,我们提议了分散的追踪查询与我们 TransCentreal Decoder的图像特征有效互动,以便保持一个合理的运行时间。结果是,TransCenter显示两种惊人的性测深度测试性测试和比值比值比值比值比值(通过一个大的精确的当前标准),并且通过一个快速的比值研究对当前标准进行更精确的比较。