Recently, deep convolutional neural networks (CNNs) have achieved many eye-catching results. However, deploying CNNs on resource-constrained edge devices is constrained by limited memory bandwidth for transmitting large intermediated data during inference, i.e., activation. Existing research utilizes mixed-precision and dimension reduction to reduce computational complexity but pays less attention to its application for activation compression. To further exploit the redundancy in activation, we propose a learnable mixed-precision and dimension reduction co-design system, which separates channels into groups and allocates specific compression policies according to their importance. In addition, the proposed dynamic searching technique enlarges search space and finds out the optimal bit-width allocation automatically. Our experimental results show that the proposed methods improve 3.54%/1.27% in accuracy and save 0.18/2.02 bits per value over existing mixed-precision methods on ResNet18 and MobileNetv2, respectively.
翻译:最近,深刻的进化神经网络(CNNs)取得了许多吸引目光的结果。然而,在资源限制的边缘装置上部署CNN受到有限的内存带宽的限制,无法在推断(即激活)期间传输大型中间数据。现有的研究采用混合精度和维度的减少来降低计算复杂性,但较少注意其激活压缩的应用。为了进一步利用激活中的冗余,我们提议采用一个可学习的混合精度和尺寸减少共同设计系统,将频道分为不同的组,并根据它们的重要性分配具体的压缩政策。此外,拟议的动态搜索技术扩大了搜索空间,并自动发现最佳的位宽分配。我们的实验结果表明,拟议方法提高了3.54%/27%的精确度,并节省了ResNet18和MiveNetv2上现有的混合精度方法的0.18.2.02比特。