Network compression is crucial to making the deep networks to be more efficient, faster, and generalizable to low-end hardware. Current network compression methods have two open problems: first, there lacks a theoretical framework to estimate the maximum compression rate; second, some layers may get over-prunned, resulting in significant network performance drop. To solve these two problems, this study propose a gradient-matrix singularity analysis-based method to estimate the maximum network redundancy. Guided by that maximum rate, a novel and efficient hierarchical network pruning algorithm is developed to maximally condense the neuronal network structure without sacrificing network performance. Substantial experiments are performed to demonstrate the efficacy of the new method for pruning several advanced convolutional neural network (CNN) architectures. Compared to existing pruning methods, the proposed pruning algorithm achieved state-of-the-art performance. At the same or similar compression ratio, the new method provided the highest network prediction accuracy as compared to other methods.
翻译:网络压缩对于使深网络更高效、更快、更快速、更普及到低端硬件至关重要。 当前网络压缩方法有两个尚未解决的问题: 首先, 缺乏估算最大压缩率的理论框架; 第二, 一些层可能会被过度消耗, 导致网络性能显著下降 。 为了解决这两个问题, 本研究提出了基于梯度矩阵的奇异性分析法, 以估算最大网络冗余。 遵循这个最高比率, 开发了一个新颖、 高效的等级网络裁剪算法, 以最大限度地压缩神经网络结构, 而不牺牲网络性能 。 进行了大量实验, 以展示一些先进的高级神经网络( CNN) 结构的新运行方法的功效。 与现有的剪裁方法相比, 拟议的剪裁算算法达到了最新性能。 在同样或类似的压缩率下, 新方法提供了最高的网络预测精确度, 与其他方法相比, 。