Model compression is generally performed by using quantization, low-rank approximation or pruning, for which various algorithms have been researched in recent years. One fundamental question is: what types of compression work better for a given model? Or even better: can we improve by combining compressions in a suitable way? We formulate this generally as a problem of optimizing the loss but where the weights are constrained to equal an additive combination of separately compressed parts; and we give an algorithm to learn the corresponding parts' parameters. Experimentally with deep neural nets, we observe that 1) we can find significantly better models in the error-compression space, indicating that different compression types have complementary benefits, and 2) the best type of combination depends exquisitely on the type of neural net. For example, we can compress ResNets and AlexNet using only 1 bit per weight without error degradation at the cost of adding a few floating point weights. However, VGG nets can be better compressed by combining low-rank with a few floating point weights.
翻译:模型压缩通常通过使用量子化、低位近似值或剪裁法来进行,近年来已经为此对各种算法进行了研究。一个基本的问题是:对某一模型来说,哪种压缩方法更适合哪个类型的压缩?或者更好:我们能否以适当的方式结合压缩进行改进?我们一般地把它表述为优化损失的问题,但当重量限制在与单独压缩部件等同的添加式组合时;我们用一种算法来学习相应的部分参数。实验用深神经网,我们观察到:(1) 我们可以在错误压缩空间找到更好的模型,表明不同的压缩类型具有互补效益,以及(2) 最佳组合类型完全取决于神经网络的类型。例如,我们可以将ResNets和AlexNet仅仅使用每重量1位的压缩,而不会以增加几个浮点重量为代价而造成误差降解。然而,通过将低级与少数浮点重量相结合,VGGNet可以更好地压缩。