3D neural networks are widely used in real-world applications (e.g., AR/VR headsets, self-driving cars). They are required to be fast and accurate; however, limited hardware resources on edge devices make these requirements rather challenging. Previous work processes 3D data using either voxel-based or point-based neural networks, but both types of 3D models are not hardware-efficient due to the large memory footprint and random memory access. In this paper, we study 3D deep learning from the efficiency perspective. We first systematically analyze the bottlenecks of previous 3D methods. We then combine the best from point-based and voxel-based models together and propose a novel hardware-efficient 3D primitive, Point-Voxel Convolution (PVConv). We further enhance this primitive with the sparse convolution to make it more effective in processing large (outdoor) scenes. Based on our designed 3D primitive, we introduce 3D Neural Architecture Search (3D-NAS) to explore the best 3D network architecture given a resource constraint. We evaluate our proposed method on six representative benchmark datasets, achieving state-of-the-art performance with 1.8-23.7x measured speedup. Furthermore, our method has been deployed to the autonomous racing vehicle of MIT Driverless, achieving larger detection range, higher accuracy and lower latency.
翻译:3D 神经网络被广泛用于现实世界的应用(如AR/VR头盔、自驾驶汽车等) 。 它们需要快速和准确; 然而, 边缘装置的硬件资源有限, 使得这些要求更具挑战性。 先前的工作过程 3D 数据使用基于 voxel 或基于点的神经网络, 但这两种类型的3D 模型都由于记忆足迹大和随机存取而不具备硬件效率。 在本文中, 我们从效率角度研究 3D 深度学习。 我们首先系统分析前3D 方法的瓶颈。 然后, 我们把基于点基和 voxel 的最好模型结合起来, 并提出一个新的硬件高效的3D 3D 原始、 点- Voxel Convolution (PV Convol) 3D 数据。 我们进一步加强这种原始的3D 数据程序, 使其更有效地处理大( 室外) 场景。 由于我们设计的3D 原始程序, 我们引入了 3D 神经结构搜索 (3D-NAS), 来探索3D 3D 网络架构中的最佳结构结构结构, 。 由于资源限制。 我们评估了6个有代表性基基基基基基基建的最佳方法,, 和VOR 3D