FPGA 高效率神经网络推论的网络列表差异 (Logic Shrinkage: Learned FPGA Netlist Sparsity for Efficient Neural Network Inference)

FPGA-specific DNN architectures using the native LUTs as independently trainable inference operators have been shown to achieve favorable area-accuracy and energy-accuracy tradeoffs. The first work in this area, LUTNet, exhibited state-of-the-art performance for standard DNN benchmarks. In this paper, we propose the learned optimization of such LUT-based topologies, resulting in higher-efficiency designs than via the direct use of off-the-shelf, hand-designed networks. Existing implementations of this class of architecture require the manual specification of the number of inputs per LUT, K. Choosing appropriate K a priori is challenging, and doing so at even high granularity, e.g. per layer, is a time-consuming and error-prone process that leaves FPGAs' spatial flexibility underexploited. Furthermore, prior works see LUT inputs connected randomly, which does not guarantee a good choice of network topology. To address these issues, we propose logic shrinkage, a fine-grained netlist pruning methodology enabling K to be automatically learned for every LUT in a neural network targeted for FPGA inference. By removing LUT inputs determined to be of low importance, our method increases the efficiency of the resultant accelerators. Our GPU-friendly solution to LUT input removal is capable of processing large topologies during their training with negligible slowdown. With logic shrinkage, we better the area and energy efficiency of the best-performing LUTNet implementation of the CNV network classifying CIFAR-10 by 1.54x and 1.31x, respectively, while matching its accuracy. This implementation also reaches 2.71x the area efficiency of an equally accurate, heavily pruned BNN. On ImageNet with the Bi-Real Net architecture, employment of logic shrinkage results in a post-synthesis area reduction of 2.67x vs LUTNet, allowing for implementation that was previously impossible on today's largest FPGAs.

翻译：FPGA 专用 DNN 架构使用本地本地 LUT 进行独立训练的网络精度测算, 以本地本地的 LUT 配置本地的本地的 LUT 本地的 DNN 结构, 其效率设计比直接使用现成的、手工设计的网络友好型网络。此类架构的现有实施要求对本地的LUT投入量的手工规格, K. 选择适当的 K a 直线式 K 具有挑战性, 甚至在高颗粒度( 如每层) 上, 展示了最先进的 DNNNNT 。在本文中, 我们建议优化基于本地的 LUT 配置, 随机地连接LUT 的输入量, 无法保证今天的网络精选。为了解决这些问题, 我们提议逻辑缩放, 将精细的网络节流法调法化方法, 使得我们每个智能的网络精度的精度都能够自动地删除 IMUT 。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

FPGA加速深度学习综述

专知会员服务

71+阅读 · 2021年11月13日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【清华大学】图随机神经网络，Graph Random Neural Networks

专知会员服务

156+阅读 · 2020年5月26日

【CMU】图卷积神经网络中的池化综述，Pooling in Graph Convolutional Neural Network

专知会员服务

46+阅读 · 2020年4月8日