Deep learning on point clouds plays a vital role in a wide range of applications such as autonomous driving and AR/VR. These applications interact with people in real-time on edge devices and thus require low latency and low energy. Compared to projecting the point cloud to 2D space, directly processing the 3D point cloud yields higher accuracy and lower #MACs. However, the extremely sparse nature of point cloud poses challenges to hardware acceleration. For example, we need to explicitly determine the nonzero outputs and search for the nonzero neighbors (mapping operation), which is unsupported in existing accelerators. Furthermore, explicit gather and scatter of sparse features are required, resulting in large data movement overhead. In this paper, we comprehensively analyze the performance bottleneck of modern point cloud networks on CPU/GPU/TPU. To address the challenges, we then present PointAcc, a novel point cloud deep learning accelerator. PointAcc maps diverse mapping operations onto one versatile ranking-based kernel, streams the sparse computation with configurable caching, and temporally fuses consecutive dense layers to reduce the memory footprint. Evaluated on 8 point cloud models across 4 applications, PointAcc achieves 3.7X speedup and 22X energy savings over RTX 2080Ti GPU. Co-designed with light-weight neural networks, PointAcc rivals the prior accelerator Mesorasi by 100X speedup with 9.1% higher accuracy running segmentation on the S3DIS dataset. PointAcc paves the way for efficient point cloud recognition.
翻译:点云的深度学习在诸如自动驾驶和AR/VR等广泛应用中发挥着关键作用。 这些应用程序与边缘设备实时用户互动,因此需要低潜值和低能量。 与将点云投射到 2D 空间相比, 直接处理 3D 点云的性能瓶颈会提高精度, 降低 # MAC 。 然而, 点云的极其稀少性质会给硬件加速带来挑战。 例如, 我们需要明确确定非零输出和搜索非零邻居( 绘图操作), 现有加速器不支持它。 此外, 需要明确收集并散布稀少的特性, 从而导致大量数据移动。 在本文件中, 我们全面分析 CPU/ GPU/ TPU 上现代点云网络的性能瓶颈, 从而应对挑战, 我们然后介绍一个新点云层的深度学习加速器。 pointAcc 绘制多种绘图操作, 以一个功能齐全的定级内核内核内核内核, 以可配置的深度计算方式和时间密度的精密层, 以降低内空段的内空段 。 AA X 10, 评估前点的S- 10 的节度的节度数据, 度, 将S- cal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal a a a lax a a lax a lax a lax lax lax a lax a lax a lax a lax a lax a laxx a lax a lax a lax a lax a lax acc acc a lax acc acc acc acc acc acc acc acc acc acc acc acc acc acc acc acc acc acc acc acc acc acc acrecrecrecrecrecrest sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal sal