In this work we propose 3D-FFS, a novel approach to make sensor fusion based 3D object detection networks significantly faster using a class of computationally inexpensive heuristics. Existing sensor fusion based networks generate 3D region proposals by leveraging inferences from 2D object detectors. However, as images have no depth information, these networks rely on extracting semantic features of points from the entire scene to locate the object. By leveraging aggregated intrinsic properties (e.g. point density) of point cloud data, 3D-FFS can substantially constrain the 3D search space and thereby significantly reduce training time, inference time and memory consumption without sacrificing accuracy. To demonstrate the efficacy of 3D-FFS, we have integrated it with Frustum ConvNet (F-ConvNet), a prominent sensor fusion based 3D object detection model. We assess the performance of 3D-FFS on the KITTI dataset. Compared to F-ConvNet, we achieve improvements in training and inference times by up to 62.80% and 58.96%, respectively, while reducing the memory usage by up to 58.53%. Additionally, we achieve 0.36%, 0.59% and 2.19% improvements in accuracy for the Car, Pedestrian and Cyclist classes, respectively. 3D-FFS shows a lot of promise in domains with limited computing power, such as autonomous vehicles, drones and robotics where LiDAR-Camera based sensor fusion perception systems are widely used.
翻译:在这项工作中,我们提出3D-FFS, 这是一种新颖的办法,使基于传感器的3D对象探测网络使用一种计算成本低廉的偏差,大大加快基于传感器的3D对象探测网络。现有的基于传感器的聚合网络通过利用2D物体探测器的推断产生3D区域建议。然而,由于图像没有深度信息,这些网络依靠从整个场景提取点的语义特征来定位物体。通过利用点云数据的综合内在属性(如点密度),3D-FFS可以大大限制3D搜索空间,从而大大减少培训时间,推断时间和记忆消耗,而不会牺牲准确性。为了展示3D-FFS的功效,我们将其与基于2D物体探测模型的著名传感器ConvNet(F-ConvNet)结合了3D区域。我们评估了KITTI数据集上3D-FFS的性能。与F-ConvNet相比, 我们的培训和推断次数分别达到62.80%和58.96%,同时将存储力的使用率降低到58.53D的频率, 我们分别实现了0.3%的C-ralli-ralli-lilililils。