Deep learning-based point cloud processing plays an important role in various vision tasks, such as autonomous driving, virtual reality (VR), and augmented reality (AR). The submanifold sparse convolutional network (SSCN) has been widely used for the point cloud due to its unique advantages in terms of visual results. However, existing convolutional neural network accelerators suffer from non-trivial performance degradation when employed to accelerate SSCN because of the extreme and unstructured sparsity, and the complex computational dependency between the sparsity of the central activation and the neighborhood ones. In this paper, we propose a high performance FPGA-based accelerator for SSCN. Firstly, we develop a zero removing strategy to remove the coarse-grained redundant regions, thus significantly improving computational efficiency. Secondly, we propose a concise encoding scheme to obtain the matching information for efficient point-wise multiplications. Thirdly, we develop a sparse data matching unit and a computing core based on the proposed encoding scheme, which can convert the irregular sparse operations into regular multiply-accumulate operations. Finally, an efficient hardware architecture for the submanifold sparse convolutional layer is developed and implemented on the Xilinx ZCU102 field-programmable gate array board, where the 3D submanifold sparse U-Net is taken as the benchmark. The experimental results demonstrate that our design drastically improves computational efficiency, and can dramatically improve the power efficiency by 51 times compared to GPU.
翻译:在各种愿景任务中,如自主驱动、虚拟现实(VR)和增强现实(AR),基于深层次学习的云层处理在各种自主驱动、虚拟现实(VR)和增强现实(AR)中起着重要作用。 亚磁分散的电动网络(SSCN)由于在视觉结果方面的独特优势,已被广泛用于点云层。 然而,现有的电动神经网络加速器由于极端和无结构的宽度以及中央启动和周边启动的广度之间的复杂计算依赖性而处于非三角性性性性能退化状态。在本文件中,我们提议为SSCN建立一个基于FPGA的高级性能电动加速器。首先,我们制定了一个消除点性能分散性电动电动网络网络网络(SSCN)网络(SSCN)的零去除战略,从而大大提高计算效率。 其次,我们提出一个简明的编码计划,以获得高效的数据匹配单位和计算核心计算,根据拟议的编码计划,可以将非正常的稀散操作转换成定期的倍增积存操作。最后,一个高效的硬件结构结构,即以51号SQRO-comal-comal commal commal commal combal commax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax