The deployment of AI models on low-power, real-time edge devices requires accelerators for which energy, latency, and area are all first-order concerns. There are many approaches to enabling deep neural networks (DNNs) in this domain, including pruning, quantization, compression, and binary neural networks (BNNs), but with the emergence of the "extreme edge", there is now a demand for even more efficient models. In order to meet the constraints of ultra-low-energy devices, we propose ULEEN, a model architecture based on weightless neural networks. Weightless neural networks (WNNs) are a class of neural model which use table lookups, not arithmetic, to perform computation. The elimination of energy-intensive arithmetic operations makes WNNs theoretically well suited for edge inference; however, they have historically suffered from poor accuracy and excessive memory usage. ULEEN incorporates algorithmic improvements and a novel training strategy inspired by BNNs to make significant strides in improving accuracy and reducing model size. We compare FPGA and ASIC implementations of an inference accelerator for ULEEN against edge-optimized DNN and BNN devices. On a Xilinx Zynq Z-7045 FPGA, we demonstrate classification on the MNIST dataset at 14.3 million inferences per second (13 million inferences/Joule) with 0.21 $\mu$s latency and 96.2% accuracy, while Xilinx FINN achieves 12.3 million inferences per second (1.69 million inferences/Joule) with 0.31 $\mu$s latency and 95.83% accuracy. In a 45nm ASIC, we achieve 5.1 million inferences/Joule and 38.5 million inferences/second at 98.46% accuracy, while a quantized Bit Fusion model achieves 9230 inferences/Joule and 19,100 inferences/second at 99.35% accuracy. In our search for ever more efficient edge devices, ULEEN shows that WNNs are deserving of consideration.
翻译:AI模型在低功耗,实时边缘设备上的部署需要加速器,其中能量,延迟和面积都是一级关注点。使深度神经网络(DNN)在这个领域中的实际运用变得有许多不同的方法,其中包括剪枝,量化,压缩和二进制神经网络(BNN),但随着“极限边缘”的出现,现在需要更高效的模型。为了满足超低功耗设备的限制,我们提出了ULEEN ,这是一种基于零权重神经网络的模型体系结构。零权重神经网络(WNN)是一种使用查表而非算术来执行计算的神经模型。不通过大量算数操作而实现计算使WNN在边缘推理方面理论上非常适合;但是,它们在历史上在精度和内存使用方面一直存在问题。ULEEN包括算法改进和受BNN启发的新型训练策略,以大幅改善准确性,并减少模型大小。我们将ULEEN的推理加速器的FPGA和ASIC实现与面向边缘优化的DNN和BNN设备进行了比较。在Xilinx Zynq Z-7045 FPGA上,我们展示了MNIST数据集上的分类,速度达到了每秒14.3亿(每焦耳13百万次推断),延迟0.21微秒,精度达到了96.2%,而Xilinx FINN则实现了每秒12.3亿次推断(每焦耳1.69百万次推断),延迟0.31微秒,精度为95.83%。在45纳米ASIC中,我们实现了每焦耳5.1百万次推断,每秒38.5亿次推断,精度为98.46%,而量子位融合模型则实现了每焦耳9230次和每秒19100次的推断,精度为99.35%。在我们寻找更高效的边缘设备时,ULEEN表明WNN值得考虑。