Multi-object tracking is an important ability for an autonomous vehicle to safely navigate a traffic scene. Current state-of-the-art follows the tracking-by-detection paradigm where existing tracks are associated with detected objects through some distance metric. The key challenges to increase tracking accuracy lie in data association and track life cycle management. We propose a probabilistic, multi-modal, multi-object tracking system consisting of different trainable modules to provide robust and data-driven tracking results. First, we learn how to fuse features from 2D images and 3D LiDAR point clouds to capture the appearance and geometric information of an object. Second, we propose to learn a metric that combines the Mahalanobis and feature distances when comparing a track and a new detection in data association. And third, we propose to learn when to initialize a track from an unmatched object detection. Through extensive quantitative and qualitative results, we show that when using the same object detectors our method outperforms state-of-the-art approaches on the NuScenes and KITTI datasets.
翻译:多点跟踪是自主飞行器安全导航交通场景的重要能力。 目前的最新技术是跟踪和检测模式, 现有轨道通过某种距离测量与探测到的物体相关。 提高跟踪准确度的关键挑战在于数据关联和跟踪生命周期管理。 我们建议建立一个由不同可训练模块组成的概率、 多模式、 多点跟踪系统, 以提供可靠和数据驱动的跟踪结果。 首先, 我们学习如何将2D图像和 3D LiDAR 点云的特性融合起来, 以捕捉物体的外观和几何信息。 其次, 我们提议在比较轨迹和数据关联中的新探测时, 学习一个将马哈拉诺比和距离结合起来的尺度。 第三, 我们提议在何时从不匹配的物体探测中初始化一个轨迹。 通过广泛的定量和定性结果, 我们显示, 当使用同一对象检测器检测方法时, 我们的方法比 Nuscenes 和 KITTI 数据集的状态方法要快。