Precision scaling has emerged as a popular technique to optimize the compute and storage requirements of Deep Neural Networks (DNNs). Efforts toward creating ultra-low-precision (sub-8-bit) DNNs suggest that the minimum precision required to achieve a given network-level accuracy varies considerably across networks, and even across layers within a network, requiring support for variable precision in DNN hardware. Previous proposals such as bit-serial hardware incur high overheads, significantly diminishing the benefits of lower precision. To efficiently support precision re-configurability in DNN accelerators, we introduce an approximate computing method wherein DNN computations are performed block-wise (a block is a group of bits) and re-configurability is supported at the granularity of blocks. Results of block-wise computations are composed in an approximate manner to enable efficient re-configurability. We design a DNN accelerator that embodies approximate blocked computation and propose a method to determine a suitable approximation configuration for a given DNN. By varying the approximation configurations across DNNs, we achieve 1.11x-1.34x and 1.29x-1.6x improvement in system energy and performance respectively, over an 8-bit fixed-point (FxP8) baseline, with negligible loss in classification accuracy. Further, by varying the approximation configurations across layers and data-structures within DNNs, we achieve 1.14x-1.67x and 1.31x-1.93x improvement in system energy and performance respectively, with negligible accuracy loss.
翻译:精确度缩放已成为优化深神经网络(DNN)的计算和储存要求的一种流行技术。创建超低精确度(sub-8-bit) DNN的努力表明,实现特定网络级精确度所需的最低精度在各网络之间,甚至在一个网络内部的跨层之间差异很大,需要支持DNN硬件的可变精确度。以前的建议,如Bits-sernial硬件产生高间接费用,大大降低低精确度的好处。为了高效率地支持DNNN加速器的精确再配置,我们采用了一种大致的计算方法,即DNN的精确性能是按区块(一个区块是一组位)进行超低精确度计算和存储。DNN的精确度(DNNC的精确度是一组)和重新配置在区块块的颗粒度上,在1.11x-1x-1x的损耗离层、1x1.29x的精确度上,在1x级的能量水平上,我们用1x-x的精确度的精确度系统,在1x损失-x的精确度上,在1x的精确度上,在1xx的能量结构中,我们达到1x的精确度上,在1x的精确度上,在1x的精确度上,在1x的精确度上,在1x。