The goal of model compression is to reduce the size of a large neural network while retaining a comparable performance. As a result, computation and memory costs in resource-limited applications may be significantly reduced by dropping redundant weights, neurons, or layers. There have been many model compression algorithms proposed that provide impressive empirical success. However, a theoretical understanding of model compression is still limited. One problem is understanding if a network is more compressible than another of the same structure. Another problem is quantifying how much one can prune a network with theoretically guaranteed accuracy degradation. In this work, we propose to use the sparsity-sensitive $\ell_q$-norm ($0<q<1$) to characterize compressibility and provide a relationship between soft sparsity of the weights in the network and the degree of compression with a controlled accuracy degradation bound. We also develop adaptive algorithms for pruning each neuron in the network informed by our theory. Numerical studies demonstrate the promising performance of the proposed methods compared with standard pruning algorithms.
翻译:模型压缩的目的是缩小大型神经网络的规模,同时保持类似的性能。因此,资源有限的应用中的计算和记忆成本可能会通过减少冗余重量、神经元或层而大大降低。已经提出了许多模型压缩算法,这些算法提供了令人印象深刻的成功经验。然而,对模型压缩的理论理解仍然有限。一个问题在于理解网络是否比同一结构中的其他网络更压缩。另一个问题是量化在理论上保证精确度降低的情况下,能在多大程度上利用网络。在这项工作中,我们提议使用对孔径敏感($\ell_q$-norm $0 < q < 1$)来描述网络重量的软宽度和压缩程度之间的关系,并有控制的精确度降解。我们还为根据我们的理论对网络中每个神经元进行调整制定适应性算法。数量研究表明,与标准的钻探算法相比,拟议方法的绩效是大有希望的。