While model compression is increasingly important because of large neural network size, compression-aware training is challenging as it needs sophisticated model modifications and longer training time.In this paper, we introduce regularization frequency (i.e., how often compression is performed during training) as a new regularization technique for a practical and efficient compression-aware training method. For various regularization techniques, such as weight decay and dropout, optimizing the regularization strength is crucial to improve generalization in Deep Neural Networks (DNNs). While model compression also demands the right amount of regularization, the regularization strength incurred by model compression has been controlled only by compression ratio. Throughout various experiments, we show that regularization frequency critically affects the regularization strength of model compression. Combining regularization frequency and compression ratio, the amount of weight updates by model compression per mini-batch can be optimized to achieve the best model accuracy. Modulating regularization frequency is implemented by occasional model compression while conventional compression-aware training is usually performed for every mini-batch.
翻译:虽然由于神经网络规模大,模型压缩越来越重要,但压缩意识培训具有挑战性,因为它需要复杂的模型修改和更长的培训时间。 在本文件中,我们引入了正规化频率(即在培训期间如何经常进行压缩),作为实用而有效的压缩意识培训方法的一种新的正规化技术。对于各种正规化技术,例如体重衰减和辍学,优化正规化强度对于改善深神经网络(DNNs)的一般化至关重要。虽然模型压缩还要求适当的正规化数量,但模型压缩产生的正规化强度仅由压缩比率控制。我们通过各种实验,都表明正规化频率严重影响了模型压缩的正规化强度。将正规化频率和压缩比率相结合,每个小型批次通过模型压缩更新的重量数量可以优化,以实现最佳的模型准确性。通过偶尔的模型压缩来调整正规化频率,而常规的压缩能力培训通常由每个小型批次进行。