Real-time detection of objects in the 3D scene is one of the tasks an autonomous agent needs to perform for understanding its surroundings. While recent Deep Learning-based solutions achieve satisfactory performance, their high computational cost renders their application in real-life settings in which computations need to be performed on embedded platforms intractable. In this paper, we analyze the efficiency of two popular voxel-based 3D object detection methods providing a good compromise between high performance and speed based on two aspects, their ability to detect objects located at large distances from the agent and their ability to operate in real time on embedded platforms equipped with high-performance GPUs. Our experiments show that these methods mostly fail to detect distant small objects due to the sparsity of the input point clouds at large distances. Moreover, models trained on near objects achieve similar or better performance compared to those trained on all objects in the scene. This means that the models learn object appearance representations mostly from near objects. Our findings suggest that a considerable part of the computations of existing methods is focused on locations of the scene that do not contribute with successful detection. This means that the methods can achieve a speed-up of $40$-$60\%$ by restricting operation to near objects while not sacrificing much in performance.
翻译:在三维场景中实时探测物体是自主代理机构为了解周围环境而需要完成的任务之一。虽然最近的深学习解决方案取得了令人满意的效果,但其高昂的计算成本使得这些解决方案在实际生活中的应用令人满意,在嵌入式平台难以进行计算。在本文中,我们分析了两种流行的基于三维立体物体探测方法的效率,这些方法提供了基于两个方面的高性能和速度之间的良好折中性能和速度,它们探测远离该代理机构的物体的能力,以及它们实时在装有高性能GPU的嵌入式平台上运行的能力。我们的实验表明,由于输入点云层的广度,这些方法大多无法探测远处的小物体。此外,在近物体上接受培训的模型的性能与在现场所有物体上所训练的模型的性能相似或更好。这意味着模型主要从近处了解物体的外观表现。我们的研究结果表明,现有方法的计算相当一部分集中在无法成功探测的场景地点。这意味着,这些方法可以实现40美元至60美元的速度,而不能通过限制操作来使性能达到40美元。