Low-rankness plays an important role in traditional machine learning, but is not so popular in deep learning. Most previous low-rank network compression methods compress the networks by approximating pre-trained models and re-training. However, the optimal solution in the Euclidean space may be quite different from the one in the low-rank manifold. A well-pre-trained model is not a good initialization for the model with low-rank constraints. Thus, the performance of a low-rank compressed network degrades significantly. Compared to other network compression methods such as pruning, low-rank methods attracts less attention in recent years. In this paper, we devise a new training method, low-rank projection with energy transfer (LRPET), that trains low-rank compressed networks from scratch and achieves competitive performance. First, we propose to alternately perform stochastic gradient descent training and projection onto the low-rank manifold. Compared to re-training on the compact model, this enables full utilization of model capacity since solution space is relaxed back to Euclidean space after projection. Second, the matrix energy (the sum of squares of singular values) reduction caused by projection is compensated by energy transfer. We uniformly transfer the energy of the pruned singular values to the remaining ones. We theoretically show that energy transfer eases the trend of gradient vanishing caused by projection. Third, we propose batch normalization (BN) rectification to cut off its effect on the optimal low-rank approximation of the weight matrix, which further improves the performance. Comprehensive experiments on CIFAR-10 and ImageNet have justified that our method is superior to other low-rank compression methods and also outperforms recent state-of-the-art pruning methods. Our code is available at https://github.com/BZQLin/LRPET.
翻译:低级别在传统机器学习中起着重要作用,但在深层学习中并不那么受欢迎。 多数先前的低级别网络压缩方法通过近似培训前模式和再培训来压缩网络。 但是, Euclidean 空间的最佳解决方案可能与低级别平台的解决方案大不相同。 一个经过良好培训的模型并不是低级别压缩网络的模型的良好初始化,因此,低级别压缩网络的性能会大幅下降。 与其他网络压缩方法相比, 低级别网络压缩方法, 如裁剪、 低级别方法近年来吸引的注意较少。 在本文件中,我们设计了新的培训方法, 低级别网络的预测与能源转移( LRPET ), 低级别网络的最佳解决方案可能与低级别网络相比, 低级别网络压缩的网络压缩方法比起来要低得多。 在投影后, 低级别网络的图像空间会进一步放松到 Eucrideal 空间。 第二, 高层次的能源预测(我们不断降低的汇率转移), 也通过不断降低常规能源转移, 我们的汇率, 导致不断降低。