使用pQRNN 将大语言模型蒸馏成微小和有效学生 (Distilling Large Language Models into Tiny and Effective Students using pQRNN)

Large pre-trained multilingual models like mBERT, XLM-R achieve state of the art results on language understanding tasks. However, they are not well suited for latency critical applications on both servers and edge devices. It's important to reduce the memory and compute resources required by these models. To this end, we propose pQRNN, a projection-based embedding-free neural encoder that is tiny and effective for natural language processing tasks. Without pre-training, pQRNNs significantly outperform LSTM models with pre-trained embeddings despite being 140x smaller. With the same number of parameters, they outperform transformer baselines thereby showcasing their parameter efficiency. Additionally, we show that pQRNNs are effective student architectures for distilling large pre-trained language models. We perform careful ablations which study the effect of pQRNN parameters, data augmentation, and distillation settings. On MTOP, a challenging multilingual semantic parsing dataset, pQRNN students achieve 95.9\% of the performance of an mBERT teacher while being 350x smaller. On mATIS, a popular parsing task, pQRNN students on average are able to get to 97.1\% of the teacher while again being 350x smaller. Our strong results suggest that our approach is great for latency-sensitive applications while being able to leverage large mBERT-like models.

翻译：在语言理解任务上, 诸如 mBERT、 XLM- R 等经过预先培训的大型多语言模型在语言理解任务上取得了最新水平的成绩。但是, 它们并不完全适合服务器和边缘设备上的延缓关键应用程序。重要的是要减少这些模型所需要的记忆和计算资源。为此, 我们提议 PQRNN, 是一个小于投影的嵌入式无神经编码器, 对自然语言处理任务来说是小而有效的。没有预培训, pQRMNNs 大大超越了具有预先培训的嵌入功能的LSTM 模型。由于参数数量相同, 它们超越了变异器基线, 从而展示了它们的参数效率。此外, 我们显示, PQRNNNes 是用于蒸馏大型预先培训语言模型的有效学生结构。我们进行谨慎的推算, 研究PQRNNN参数、数据增强和蒸馏环境的效果。在MOPPOP上, 一个具有挑战性的多语种语种解式数据设置, pQRNNNN学生取得了95 ⁇ 的成绩, 而一个更小的 mERT 老师的模小的成绩, 而我们的平均任务又成为了350NIS 。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！华盛顿大学最新《生成式模型》课程，附PPT

专知会员服务

65+阅读 · 2020年12月11日

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

【EMNLP2020】低资源域适应的多阶段预训练

专知会员服务

19+阅读 · 2020年10月13日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日