Most (3D) multi-object tracking methods rely on appearance-based cues for data association. By contrast, we investigate how far we can get by only encoding geometric relationships between objects in 3D space as cues for data-driven data association. We encode 3D detections as nodes in a graph, where spatial and temporal pairwise relations among objects are encoded via localized polar coordinates on graph edges. This representation makes our geometric relations invariant to global transformations and smooth trajectory changes, especially under non-holonomic motion. This allows our graph neural network to learn to effectively encode temporal and spatial interactions and fully leverage contextual and motion cues to obtain final scene interpretation by posing data association as edge classification. We establish a new state-of-the-art on nuScenes dataset and, more importantly, show that our method, PolarMOT, generalizes remarkably well across different locations (Boston, Singapore, Karlsruhe) and datasets (nuScenes and KITTI).
翻译:多数( 3D) 多重对象跟踪方法依赖于基于外观的数据关联线索。 相反, 我们调查我们只能将三维空间天体之间的几何关系编码为数据驱动数据关联的提示, 才能从三维空间天体之间的几何关系中取得多大程度的成绩。 我们将三维探测编码为图表中的节点, 在图形边缘通过局部极地坐标对天体之间的空间和时间对等关系进行编码。 此表示使我们的几何关系与全球变化和平稳轨道变化, 特别是在非光谱运动下。 这使得我们的图形神经网络能够学习如何有效地编码时间和空间互动, 并充分利用背景和运动提示, 通过将数据关联作为边缘分类来获得最终的场景解释。 我们在 nuScenes 数据集上建立了一个新的状态, 更重要的是, 显示我们的方法, 极地MOT, 在不同地点( 波斯顿、 新加坡、 卡尔斯鲁赫) 和 数据集( nuscenes 和 KITTI ) 都非常广泛。