Applications of neural networks on edge systems have proliferated in recent years but the ever-increasing model size makes neural networks not able to deploy on resource-constrained microcontrollers efficiently. We propose bit-serial weight pools, an end-to-end framework that includes network compression and acceleration of arbitrary sub-byte precision. The framework can achieve up to 8x compression compared to 8-bit networks by sharing a pool of weights across the entire network. We further propose a bit-serial lookup based software implementation that allows runtime-bitwidth tradeoff and is able to achieve more than 2.8x speedup and 7.5x storage compression compared to 8-bit weight pool networks, with less than 1% accuracy drop.
翻译:近些年来,边缘系统中神经网络应用量激增,但模型规模不断增加,使得神经网络无法有效部署在资源受限制的微控制器上。我们提议采用比特空气重量库,这是一个端到端框架,包括网络压缩和任意加速次字节精确度。这个框架通过在整个网络中共享一个重量库,可以达到8x压缩,而8比特网络则达到8x压缩。我们进一步提议采用以比特序列为基础的软件实施,允许运行比特维特的微控制器进行交换,并能够实现2.8x速度和7.5x存储压缩,而精度下降不到1%。