Monocular cameras are one of the most commonly used sensors in the automotive industry for autonomous vehicles. One major drawback using a monocular camera is that it only makes observations in the two dimensional image plane and can not directly measure the distance to objects. In this paper, we aim at filling this gap by developing a multi-object tracking algorithm that takes an image as input and produces trajectories of detected objects in a world coordinate system. We solve this by using a deep neural network trained to detect and estimate the distance to objects from a single input image. The detections from a sequence of images are fed in to a state-of-the art Poisson multi-Bernoulli mixture tracking filter. The combination of the learned detector and the PMBM filter results in an algorithm that achieves 3D tracking using only mono-camera images as input. The performance of the algorithm is evaluated both in 3D world coordinates, and 2D image coordinates, using the publicly available KITTI object tracking dataset. The algorithm shows the ability to accurately track objects, correctly handle data associations, even when there is a big overlap of the objects in the image, and is one of the top performing algorithms on the KITTI object tracking benchmark. Furthermore, the algorithm is efficient, running on average close to 20 frames per second.
翻译:自动车辆汽车工业最常用的传感器之一。使用单镜相机的一个主要缺点是,它只对两维图像平面进行观测,无法直接测量物体的距离。在本文中,我们的目标是通过开发一个多对象跟踪算法来填补这一空白,该算法仅将图像作为输入,并制作世界协调系统中被探测到的物体的轨迹。我们通过使用深神经网络来解决这个问题,该网络受过训练,能够探测和估计从单一输入图像到物体的距离。从图像序列中检测到的图像被输入到一个最先进的Poisson多贝诺利混合跟踪过滤器中。所学的探测器和PMBM过滤器的结合在一种算法中取得3D跟踪结果,仅使用单片图像作为输入。算法的性在3D世界坐标和2D图像坐标中进行评价,使用公开提供的 KITTI对象跟踪数据集。算法显示准确跟踪对象、正确处理数据联系的能力,即使图像中的物体是大物体重叠,也在近20个基底运行了KIT的顶级算法。