Kronecker GPT 压缩的分解 (Kronecker Decomposition for GPT Compression)

GPT is an auto-regressive Transformer-based pre-trained language model which has attracted a lot of attention in the natural language processing (NLP) domain due to its state-of-the-art performance in several downstream tasks. The success of GPT is mostly attributed to its pre-training on huge amount of data and its large number of parameters (from ~100M to billions of parameters). Despite the superior performance of GPT (especially in few-shot or zero-shot setup), this overparameterized nature of GPT can be very prohibitive for deploying this model on devices with limited computational power or memory. This problem can be mitigated using model compression techniques; however, compressing GPT models has not been investigated much in the literature. In this work, we use Kronecker decomposition to compress the linear mappings of the GPT-22 model. Our Kronecker GPT-2 model (KnGPT2) is initialized based on the Kronecker decomposed version of the GPT-2 model and then is undergone a very light pre-training on only a small portion of the training data with intermediate layer knowledge distillation (ILKD). Finally, our KnGPT2 is fine-tuned on down-stream tasks using ILKD as well. We evaluate our model on both language modeling and General Language Understanding Evaluation benchmark tasks and show that with more efficient pre-training and similar number of parameters, our KnGPT2 outperforms the existing DistilGPT2 model significantly.

翻译：GPT是一种基于自动递进式变异器的预先训练语言模型,由于它在若干下游任务中表现最先进的,自然语言处理(NLP)领域引起了许多注意。GPT的成功主要归功于它对大量数据及其大量参数(从~100M到数十亿参数)的训练前训练。尽管GPT的优异性(特别是在几发或零发的设置中),但GPT的过度参数性质对于在计算力或记忆力有限的装置上部署这一模型来说可能非常令人望而却步。这个问题可以通过模型压缩技术来缓解;然而,压缩GPT模式模型在文献中并没有受到多少调查。在这项工作中,我们使用Kronecker脱钩来压缩GPT-22模型的线性绘图。我们的Kronecker GPT-2模型(KNGPT2)的初始性能(KPT-2模型)基于KPT-2模型的分解版本,然后对仅有少量的KPT2模型培训模型进行初步培训,而我们又使用G级的常规评估,最后是我们关于PT-G的模拟的常规评估。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/