In recent years, the fields of natural language processing (NLP) and information retrieval (IR) have made tremendous progress thanksto deep learning models like Recurrent Neural Networks (RNNs), Gated Recurrent Units (GRUs) and Long Short-Term Memory (LSTMs)networks, and Transformer [120] based models like Bidirectional Encoder Representations from Transformers (BERT) [24], GenerativePre-training Transformer (GPT-2) [94], Multi-task Deep Neural Network (MT-DNN) [73], Extra-Long Network (XLNet) [134], Text-to-text transfer transformer (T5) [95], T-NLG [98] and GShard [63]. But these models are humongous in size. On the other hand,real world applications demand small model size, low response times and low computational power wattage. In this survey, wediscuss six different types of methods (Pruning, Quantization, Knowledge Distillation, Parameter Sharing, Tensor Decomposition, andSub-quadratic Transformer based methods) for compression of such models to enable their deployment in real industry NLP projects.Given the critical need of building applications with efficient and small models, and the large amount of recently published work inthis area, we believe that this survey organizes the plethora of work done by the 'deep learning for NLP' community in the past fewyears and presents it as a coherent story.
翻译:近年来,自然语言处理(NLP)和信息检索(IR)领域取得了巨大进展,这归功于以下深层学习模式:经常神经网络(NNNS)、Gated经常单元(GRUS)和长短期内存(LSTM)网络,以及基于双向编码器[120]的模型,如来自变换器(BERT)[24]、General Preating-培训变异器(GPT-2)[94]、多任务深神经网络(MT-DNNN)[73]、超任务网络(XLNet)[134]、文本到文本的转换变异器变换器(T5)[95]、GG-NLG[98]和GShard[63]。但是这些模型的大小是巨大的。另一方面,现实世界应用程序需要小规模模型、低反应时间和低计算能力瓦塔。在这次调查中,我们讨论了六种不同类型的方法(运行、量化、知识蒸馏、Plater 共享、Temet Shad 分享、Tens-lent-text 转换变换变换器变换器变换器变换器变换器变换器变换器变换器的大规模模型,这些模型的模型是用来在不断的大规模的模型和不断的模型,在不断变现的模型和不断变现变换成的模型中,在不断变式的模型中,在不断变式的模型中,在不断变式的模型和不断变式模型中进行。