3D object detection is a core component of automated driving systems. State-of-the-art methods fuse RGB imagery and LiDAR point cloud data frame-by-frame for 3D bounding box regression. However, frame-by-frame 3D object detection suffers from noise, field-of-view obstruction, and sparsity. We propose a novel Temporal Fusion Module (TFM) to use information from previous time-steps to mitigate these problems. First, a state-of-the-art frustum network extracts point cloud features from raw RGB and LiDAR point cloud data frame-by-frame. Then, our TFM module fuses these features with a recurrent neural network. As a result, 3D object detection becomes robust against single frame failures and transient occlusions. Experiments on the KITTI object tracking dataset show the efficiency of the proposed TFM, where we obtain ~6%, ~4%, and ~6% improvements on Car, Pedestrian, and Cyclist classes, respectively, compared to frame-by-frame baselines. Furthermore, ablation studies reinforce that the subject of improvement is temporal fusion and show the effects of different placements of TFM in the object detection pipeline. Our code is open-source and available at https://github.com/emecercelik/Temp-Frustum-Net.git.
翻译:3D对象探测是自动驱动系统的核心组成部分。 最先进的方法将 RGB 图像和 LiDAR 点云数据框架逐个框架结合为 3D 边框框回归。 然而, 3D 对象框架逐个框架的探测有噪音、 视场障碍和宽度。 我们提议一个新型的时空融合模块( TFM ), 以利用先前时间步骤中的信息来缓解这些问题。 首先, 最先进的断裂式网络从原始 RGB 和 LiDAR 点云数据框架逐个框架中提取点云。 然后, 我们的 TFM 模块将这些特征与一个经常性的神经网络连接起来。 结果, 3D 对象的探测变得强大, 对抗单一框架故障和瞬时空的封闭性。 对 KITTI 对象跟踪数据集的实验显示了拟议的 TFM 的效率, 在那里我们得到了 ~ 6%, ~ 4% 和 ~ 6% 的改进 Car, Pedestrian 和 Cyclicle 类, 的改善对象类别, 与框架/ 基准基线基线相比。 此外, 我们的TFILIL 的测试和 的改进是现有版本/ 的源源 的改进对象探测/ 。