3D multi-object tracking is a crucial component in the perception system of autonomous driving vehicles. Tracking all dynamic objects around the vehicle is essential for tasks such as obstacle avoidance and path planning. Autonomous vehicles are usually equipped with different sensor modalities to improve accuracy and reliability. While sensor fusion has been widely used in object detection networks in recent years, most existing multi-object tracking algorithms either rely on a single input modality, or do not fully exploit the information provided by multiple sensing modalities. In this work, we propose an end-to-end network for joint object detection and tracking based on radar and camera sensor fusion. Our proposed method uses a center-based radar-camera fusion algorithm for object detection and utilizes a greedy algorithm for object association. The proposed greedy algorithm uses the depth, velocity and 2D displacement of the detected objects to associate them through time. This makes our tracking algorithm very robust to occluded and overlapping objects, as the depth and velocity information can help the network in distinguishing them. We evaluate our method on the challenging nuScenes dataset, where it achieves 20.0 AMOTA and outperforms all vision-based 3D tracking methods in the benchmark, as well as the baseline LiDAR-based method. Our method is online with a runtime of 35ms per image, making it very suitable for autonomous driving applications.
翻译:3D 多目标跟踪是自动驾驶车辆感知系统的一个关键组成部分。 跟踪车辆周围所有动态物体对于避免障碍和道路规划等任务至关重要。 自治车辆通常配备不同的传感器模式,以提高准确性和可靠性。 虽然近年来在物体探测网络中广泛使用了传感器聚合,但大多数现有的多目标跟踪算法要么依赖单一输入模式,要么没有充分利用多种感测模式提供的信息。 在这项工作中,我们提议建立一个端对端网络,在雷达和相机传感器聚合的基础上进行联合物体探测和跟踪。 我们提议的方法使用以中心为基础的雷达集成算法进行物体探测和使用贪婪的物体关联算法。 提议的贪婪算法使用了所检测到的物体的深度、速度和2D移位,以便随着时间的推移将其联系起来。 这使我们的跟踪算法非常可靠,可以发现和重叠的物体,因为深度和速度信息可以帮助网络辨别这些物体。 我们评估了我们关于具有挑战性的核星数据集的方法,在其中实现了20.0 AMOTA, 并超越了目标关联的贪婪算法。 提议的贪婪算法使用了探测对象的深度、速度、速度和速度的三维基的三D方法, 将所有基于在线的基本追踪方法作为基准方法, 。