Multi-object tracking (MOT) enables mobile robots to perform well-informed motion planning and navigation by localizing surrounding objects in 3D space and time. Existing methods rely on depth sensors (e.g., LiDAR) to detect and track targets in 3D space, but only up to a limited sensing range due to the sparsity of the signal. On the other hand, cameras provide a dense and rich visual signal that helps to localize even distant objects, but only in the image domain. In this paper, we propose EagerMOT, a simple tracking formulation that eagerly integrates all available object observations from both sensor modalities to obtain a well-informed interpretation of the scene dynamics. Using images, we can identify distant incoming objects, while depth estimates allow for precise trajectory localization as soon as objects are within the depth-sensing range. With EagerMOT, we achieve state-of-the-art results across several MOT tasks on the KITTI and NuScenes datasets. Our code is available at https://github.com/aleksandrkim61/EagerMOT.
翻译:多物体跟踪(MOT)使移动机器人能够通过在3D空间和时间定位周围物体进行知情的动作规划和导航,现有方法依靠深度传感器(例如,LIDAR)探测和跟踪3D空间的目标,但由于信号的宽度,只能达到有限的遥感范围。另一方面,照相机提供了密集而丰富的视觉信号,有助于将甚至遥远的物体本地化,但仅限于图像域。在本文中,我们提议EagerMOT是一种简单的跟踪配方,它能将两种传感器模式的所有现有物体观测结果紧密结合在一起,以获得对场景动态的知情解释。我们使用图像可以识别远处物体,而深度估计则允许在物体处于深度测距范围内时立即精确的轨道定位。在EaggerMOT中,我们在KITTI和Nuscenes数据集的若干MOT任务中实现状态-艺术结果。我们的代码可在https://github.com/aleksandrkim61/EagerMOT上查阅。