QuaLA-MiniLM:一个量化长体适应性微型LM (QuaLA-MiniLM: a Quantized Length Adaptive MiniLM) - 专知论文

会员服务 ·

0

模型评估 · MoDELS · 变换 · 推断 · 可约的 ·

2022 年 10 月 31 日

QuaLA-MiniLM: a Quantized Length Adaptive MiniLM

翻译：QuaLA-MiniLM:一个量化长体适应性微型LM

Shira Guskin,Moshe Wasserblat,Chang Wang,Haihao Shen

from arxiv, arXiv admin note: text overlap with arXiv:2111.09645

Limited computational budgets often prevent transformers from being used in production and from having their high accuracy utilized. A knowledge distillation approach addresses the computational efficiency by self-distilling BERT into a smaller transformer representation having fewer layers and smaller internal embedding. However, the performance of these models drops as we reduce the number of layers, notably in advanced NLP tasks such as span question answering. In addition, a separate model must be trained for each inference scenario with its distinct computational budget. Dynamic-TinyBERT tackles both limitations by partially implementing the Length Adaptive Transformer (LAT) technique onto TinyBERT, achieving x3 speedup over BERT-base with minimal accuracy loss. In this work, we expand the Dynamic-TinyBERT approach to generate a much more highly efficient model. We use MiniLM distillation jointly with the LAT method, and we further enhance the efficiency by applying low-bit quantization. Our quantized length-adaptive MiniLM model (QuaLA-MiniLM) is trained only once, dynamically fits any inference scenario, and achieves an accuracy-efficiency trade-off superior to any other efficient approaches per any computational budget on the SQuAD1.1 dataset (up to x8.8 speedup with <1% accuracy loss). The code to reproduce this work will be publicly released on Github soon.

翻译：有限的计算预算往往阻止变压器用于生产和高精确度的利用。知识蒸馏方法通过将自我蒸馏BERT(LAT)技术部分应用到TinyBERT,在BERT基地上实现x3加速,且精度损失最小,从而解决计算效率问题。然而,随着我们减少层数,这些模型的性能下降,特别是在先进的NLP任务中,例如问题解答等高级NLP任务中。此外,必须用不同的计算预算预算预算来为每个推论设想单设计一个单独的模型。动态-TinyBERT解决了这两个限制,在TinyBERT(LAT)技术部分实施后,在BERT(BERT)基地上实现x3加速,且精度损失最小。在这项工作中,我们扩展了动态-TinyBERT方法,以产生一个效率更高的模型。我们与LAT方法共同使用MiniLM蒸馏法,我们通过应用低位量的夸度的量来进一步提高效率。我们量化的MILM模型(Qua-MILM(QA-MILM)仅适应性模型(Qua-M)只经过部分培训一次,在TUALA-MITEET-M-M-MINILEARM)仅进行部分部分部分进行,在任何动态的精确度上符合任何精确度的精确度,在任何精确度的精确度假设中,在任何精确度上将实现任何精确度的精确度计算。

0

相关内容

模型评估

机器学习系统设计系统评估标准

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

全球首个GNN为主的AI创业公司，募资$18.5 million！

全球首个GNN为主的AI创业公司，募资$18.5 million！

图与推荐

1+阅读 · 2022年4月16日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新七篇图像检索相关论文—草图、Tie-Aware、场景图解析、叠加跨注意力机制、深度哈希、人群估计

【论文推荐】最新七篇图像检索相关论文—草图、Tie-Aware、场景图解析、叠加跨注意力机制、深度哈希、人群估计

专知

10+阅读 · 2018年4月22日

动态重构综合模块化航空电子系统适航安全性评估方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

罗巴代数的表示和罗巴代数在operad中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

阿尔茨海默病生物标志物的电化学发光成像分析

国家自然科学基金

0+阅读 · 2015年12月31日

靶向活化SIRT1调节tau外显子10可变剪接在阿尔茨海默病防治中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

椭圆边值问题的齐性化理论及调和分析方法之研究

国家自然科学基金

0+阅读 · 2014年12月31日

特异性5hmC丢失对阿尔茨海默病神经元退行性病变的影响及意义

国家自然科学基金

0+阅读 · 2014年12月31日

基底前脑胆碱能神经元的nestin表达在AD病程中的作用与调控机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Colivelin对PDAPP转基因AD小鼠抗Aβ毒性的研究

国家自然科学基金

0+阅读 · 2012年12月31日

约束优化问题的拉格朗日乘子理论与算法研究

国家自然科学基金

1+阅读 · 2011年12月31日

蛋白磷酸酶1调节tau外显子10可变剪接及在AD致病过程中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

Prompt Gating: A Parameter Efficient Tuning Method for Zero-Shot Multi-Source Translation

Arxiv

0+阅读 · 2022年12月19日

Decoder Tuning: Efficient Language Understanding as Decoding

Arxiv

0+阅读 · 2022年12月16日

Efficient iterative arbitrary high order methods: an adaptive bridge between low and high order

Arxiv

0+阅读 · 2022年12月15日

Analysis of information cascading and propagation barriers across distinctive news events

Arxiv

0+阅读 · 2022年12月15日

Memory-like Adaptive Modeling Multi-Agent Learning System

Arxiv

0+阅读 · 2022年12月15日

Projection-free Adaptive Regret with Membership Oracles

Arxiv

0+阅读 · 2022年12月14日

Adaptive Transfer Learning on Graph Neural Networks

Arxiv

14+阅读 · 2021年7月20日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

Adaptive Attentional Network for Few-Shot Knowledge Graph Completion

Arxiv

17+阅读 · 2020年10月19日

Attributed Graph Clustering via Adaptive Graph Convolution

Arxiv

11+阅读 · 2019年6月4日

VIP会员

文章信息

相关主题

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

全球首个GNN为主的AI创业公司，募资$18.5 million！

全球首个GNN为主的AI创业公司，募资$18.5 million！

图与推荐

1+阅读 · 2022年4月16日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新七篇图像检索相关论文—草图、Tie-Aware、场景图解析、叠加跨注意力机制、深度哈希、人群估计

【论文推荐】最新七篇图像检索相关论文—草图、Tie-Aware、场景图解析、叠加跨注意力机制、深度哈希、人群估计

专知

10+阅读 · 2018年4月22日

相关论文

Prompt Gating: A Parameter Efficient Tuning Method for Zero-Shot Multi-Source Translation

Arxiv

0+阅读 · 2022年12月19日

Decoder Tuning: Efficient Language Understanding as Decoding

Arxiv

0+阅读 · 2022年12月16日

Efficient iterative arbitrary high order methods: an adaptive bridge between low and high order

Arxiv

0+阅读 · 2022年12月15日

Analysis of information cascading and propagation barriers across distinctive news events

Arxiv

0+阅读 · 2022年12月15日

Memory-like Adaptive Modeling Multi-Agent Learning System

Arxiv

0+阅读 · 2022年12月15日

Projection-free Adaptive Regret with Membership Oracles

Arxiv

0+阅读 · 2022年12月14日

Adaptive Transfer Learning on Graph Neural Networks

Arxiv

14+阅读 · 2021年7月20日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

Adaptive Attentional Network for Few-Shot Knowledge Graph Completion

Arxiv

17+阅读 · 2020年10月19日

Attributed Graph Clustering via Adaptive Graph Convolution

Arxiv

11+阅读 · 2019年6月4日

相关基金

动态重构综合模块化航空电子系统适航安全性评估方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

罗巴代数的表示和罗巴代数在operad中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

阿尔茨海默病生物标志物的电化学发光成像分析

国家自然科学基金

0+阅读 · 2015年12月31日

靶向活化SIRT1调节tau外显子10可变剪接在阿尔茨海默病防治中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

椭圆边值问题的齐性化理论及调和分析方法之研究

国家自然科学基金

0+阅读 · 2014年12月31日

特异性5hmC丢失对阿尔茨海默病神经元退行性病变的影响及意义

国家自然科学基金

0+阅读 · 2014年12月31日

基底前脑胆碱能神经元的nestin表达在AD病程中的作用与调控机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Colivelin对PDAPP转基因AD小鼠抗Aβ毒性的研究

国家自然科学基金

0+阅读 · 2012年12月31日

约束优化问题的拉格朗日乘子理论与算法研究

国家自然科学基金

1+阅读 · 2011年12月31日

蛋白磷酸酶1调节tau外显子10可变剪接及在AD致病过程中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员