Recent efforts to improve the performance of neural network (NN) accelerators that meet today's application requirements have given rise to a new trend of logic-based NN inference relying on fixed-function combinational logic (FFCL). This paper presents an innovative optimization methodology for compiling and mapping NNs utilizing FFCL into a logic processor. The presented method maps FFCL blocks to a set of Boolean functions where Boolean operations in each function are mapped to high-performance, low-latency, parallelized processing elements. Graph partitioning and scheduling algorithms are presented to handle FFCL blocks that cannot straightforwardly fit the logic processor. Our experimental evaluations across several datasets and NNs demonstrate the superior performance of our framework in terms of the inference throughput compared to prior art NN accelerators. We achieve 25x higher throughput compared with the XNOR-based accelerator for VGG16 model that can be amplified 5x deploying the graph partitioning and merging algorithms.
翻译:近期为了满足现代应用要求而提高神经网络(NN)加速器性能的努力产生了一种基于固定功能组合逻辑(FFCL)的基于逻辑的NN推理的新趋势。本文介绍了一种创新的优化方法,用于将利用FFCL的NN编译并映射到逻辑处理器上。所提出的方法将FFCL块映射到一组布尔函数,其中每个函数中的布尔运算都映射到高性能、低延迟的并行处理单元。图分区和调度算法用于处理不能直接适配到逻辑处理器的FFCL块。我们通过对多个数据集和NN进行实验评估,证明了我们框架在推理吞吐量方面相对于之前的NN加速器具有卓越的性能。对于可以采用图分区合并算法的VGG16模型,我们实现了比XNOR-based优化器高25倍的推理吞吐量,可以扩大5倍。