Singular value decomposition (SVD) is one of the most popular compression methods that approximate a target matrix with smaller matrices. However, standard SVD treats the parameters within the matrix with equal importance, which is a simple but unrealistic assumption. The parameters of a trained neural network model may affect task performance unevenly, which suggests non-equal importance among the parameters. Compared to SVD, the decomposition method aware of parameter importance is the more practical choice in real cases. Unlike standard SVD, weighted value decomposition is a non-convex optimization problem that lacks a closed-form solution. We systematically investigated multiple optimization strategies to tackle the problem and examined our method by compressing Transformer-based language models. Further, we designed a metric to predict when the SVD may introduce a significant performance drop, for which our method can be a rescue strategy. The extensive evaluations demonstrate that our method can perform better than current SOTA methods in compressing Transformer-based language models.
翻译:单值分解法( SVD) 是最受欢迎的压缩方法之一, 其近似于一个目标矩阵和较小矩阵。 然而, 标准 SVD 处理矩阵内的参数, 具有同等重要性, 这是一种简单但不现实的假设。 受过训练的神经网络模型的参数可能会对任务性能产生不平均的影响, 这表明参数中的重要性并不相等。 与 SVD 相比, 了解参数重要性的分解法在实际案例中是更实际的选择。 与标准 SVD 不同, 加权值分解法是一个非convex 优化问题, 缺乏封闭式的解决方案。 我们系统地调查了多种优化战略来解决这个问题, 并通过压缩基于变压器的语言模型检查了我们的方法。 此外, 我们设计了一种指标, 以预测SVD 何时可能引入显著的性能下降, 而我们的方法可以成为一种拯救战略。 广泛的评估表明, 我们的方法在压缩基于变压器的语言模型时, 比当前SOTA方法要好。