语文模型加权低级别估算的数值优化 (Numerical Optimizations for Weighted Low-rank Estimation on Language Model) - 专知论文

会员服务 ·

0

奇异值分解 · Performer · 优化器 · Weight · MoDELS ·

2022 年 12 月 15 日

Numerical Optimizations for Weighted Low-rank Estimation on Language Model

翻译：语文模型加权低级别估算的数值优化

Ting Hua,Yen-Chang Hsu,Felicity Wang,Qian Lou,Yilin Shen,Hongxia Jin

from arxiv, long paper EMNLP 2022

Singular value decomposition (SVD) is one of the most popular compression methods that approximate a target matrix with smaller matrices. However, standard SVD treats the parameters within the matrix with equal importance, which is a simple but unrealistic assumption. The parameters of a trained neural network model may affect task performance unevenly, which suggests non-equal importance among the parameters. Compared to SVD, the decomposition method aware of parameter importance is the more practical choice in real cases. Unlike standard SVD, weighted value decomposition is a non-convex optimization problem that lacks a closed-form solution. We systematically investigated multiple optimization strategies to tackle the problem and examined our method by compressing Transformer-based language models. Further, we designed a metric to predict when the SVD may introduce a significant performance drop, for which our method can be a rescue strategy. The extensive evaluations demonstrate that our method can perform better than current SOTA methods in compressing Transformer-based language models.

翻译：单值分解法( SVD) 是最受欢迎的压缩方法之一, 其近似于一个目标矩阵和较小矩阵。然而, 标准 SVD 处理矩阵内的参数, 具有同等重要性, 这是一种简单但不现实的假设。受过训练的神经网络模型的参数可能会对任务性能产生不平均的影响, 这表明参数中的重要性并不相等。与 SVD 相比, 了解参数重要性的分解法在实际案例中是更实际的选择。与标准 SVD 不同, 加权值分解法是一个非convex 优化问题, 缺乏封闭式的解决方案。我们系统地调查了多种优化战略来解决这个问题, 并通过压缩基于变压器的语言模型检查了我们的方法。此外, 我们设计了一种指标, 以预测SVD 何时可能引入显著的性能下降, 而我们的方法可以成为一种拯救战略。广泛的评估表明, 我们的方法在压缩基于变压器的语言模型时, 比当前SOTA方法要好。

0

相关内容

奇异值分解

奇异值分解

奇异值分解（Singular Value Decomposition）是线性代数中一种重要的矩阵分解，奇异值分解则是特征分解在任意矩阵上的推广。在信号处理、统计学等领域有重要应用。

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

平面切换微分系统的正规形及分岔

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

区域气候变化对水资源安全影响的定量分析研究

国家自然科学基金

2+阅读 · 2012年12月31日

伪抛物型方程的若干定性问题

国家自然科学基金

0+阅读 · 2012年12月31日

一类半参数时间序列模型的统计推断

国家自然科学基金

0+阅读 · 2012年12月31日

Numbl-TRAF6-TAB2对NF-kappa B活性的调节在小胶质细胞炎性活化中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

基于iRGD靶向载药脂质体-微泡复合体的超声成像引导给药治疗肿瘤的研究

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

Navier-Stokes方程的三角形cut-cell自适应有限元方法

国家自然科学基金

0+阅读 · 2011年12月31日

基于组织特异性调控元件和HSV-tk的靶向性缺陷腺病毒治疗膀胱癌的研究

国家自然科学基金

0+阅读 · 2011年12月31日

Robust expected improvement for Bayesian optimization

Arxiv

0+阅读 · 2023年2月16日

A Proximal Algorithm for Sampling

Arxiv

0+阅读 · 2023年2月16日

Adaptive Selective Sampling for Online Prediction with Experts

Arxiv

0+阅读 · 2023年2月16日

Cross-Validated Decision Trees with Targeted Maximum Likelihood Estimation for Nonparametric Causal Mixtures Analysis

Arxiv

0+阅读 · 2023年2月15日

Uncertainty-Estimation with Normalized Logits for Out-of-Distribution Detection

Arxiv

0+阅读 · 2023年2月15日

Efficient low rank approximations for parabolic control problems with unknown heat source

Arxiv

0+阅读 · 2023年2月15日

Clustering-Based Inter-Regional Correlation Estimation

Arxiv

0+阅读 · 2023年2月15日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

Feature Denoising for Improving Adversarial Robustness

Feature Denoising for Improving Adversarial Robustness

Arxiv

15+阅读 · 2018年12月9日

VIP会员

文章信息

相关主题

奇异值分解

相关VIP内容

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美陆军徒步机动作战条令手册》最新168页

【博士论文】基于不确定性的可靠性：现代机器学习中的选择性预测与可信部署

军事后勤数字化未来展望

《美海军后勤体系整合与创新挑战》最新报告

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

相关论文

Robust expected improvement for Bayesian optimization

Arxiv

0+阅读 · 2023年2月16日

A Proximal Algorithm for Sampling

Arxiv

0+阅读 · 2023年2月16日

Adaptive Selective Sampling for Online Prediction with Experts

Arxiv

0+阅读 · 2023年2月16日

Cross-Validated Decision Trees with Targeted Maximum Likelihood Estimation for Nonparametric Causal Mixtures Analysis

Arxiv

0+阅读 · 2023年2月15日

Uncertainty-Estimation with Normalized Logits for Out-of-Distribution Detection

Arxiv

0+阅读 · 2023年2月15日

Efficient low rank approximations for parabolic control problems with unknown heat source

Arxiv

0+阅读 · 2023年2月15日

Clustering-Based Inter-Regional Correlation Estimation

Arxiv

0+阅读 · 2023年2月15日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

Feature Denoising for Improving Adversarial Robustness

Feature Denoising for Improving Adversarial Robustness

Arxiv

15+阅读 · 2018年12月9日

相关基金

平面切换微分系统的正规形及分岔

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

区域气候变化对水资源安全影响的定量分析研究

国家自然科学基金

2+阅读 · 2012年12月31日

伪抛物型方程的若干定性问题

国家自然科学基金

0+阅读 · 2012年12月31日

一类半参数时间序列模型的统计推断

国家自然科学基金

0+阅读 · 2012年12月31日

Numbl-TRAF6-TAB2对NF-kappa B活性的调节在小胶质细胞炎性活化中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

基于iRGD靶向载药脂质体-微泡复合体的超声成像引导给药治疗肿瘤的研究

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

Navier-Stokes方程的三角形cut-cell自适应有限元方法

国家自然科学基金

0+阅读 · 2011年12月31日

基于组织特异性调控元件和HSV-tk的靶向性缺陷腺病毒治疗膀胱癌的研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员