Graphs offer a natural way to formulate Multiple Object Tracking (MOT) and Multiple Object Tracking and Segmentation (MOTS) within the tracking-by-detection paradigm. However, they also introduce a major challenge for learning methods, as defining a model that can operate on such structured domain is not trivial. In this work, we exploit the classical network flow formulation of MOT to define a fully differentiable framework based on Message Passing Networks (MPNs). By operating directly on the graph domain, our method can reason globally over an entire set of detections and exploit contextual features. It then jointly predicts both final solutions for the data association problem and segmentation masks for all objects in the scene while exploiting synergies between the two tasks. We achieve state-of-the-art results for both tracking and segmentation in several publicly available datasets. Our code is available at github.com/ocetintas/MPNTrackSeg.
翻译:图表提供了一种自然的方式,在跟踪逐个检测模式中制定多物体跟踪(MOT)和多物体跟踪和分割(MOTS),但也对学习方法提出了重大挑战,因为确定可在这种结构化领域运作的模型并非微不足道。在这项工作中,我们利用MOT的经典网络流配方来定义一个完全不同的基于信息传递网络的框架。通过在图形域上直接操作,我们的方法可以从全球的角度解释整个一组探测并利用背景特征。然后它共同预测数据关联问题的最后解决方案和现场所有物体的分离面罩,同时利用两个任务之间的协同作用。我们在几个公开的数据集中实现跟踪和分割的最先进的结果。我们的代码可以在 github.com/octintas/MPNTrackSeg。