Auto-regressive large language models such as GPT-3 require enormous computational resources to use. Traditionally, structured pruning methods are employed to reduce resource usage. However, their application to and efficacy for generative language models is heavily under-explored. In this paper we conduct an comprehensive evaluation of common structured pruning methods, including magnitude, random, and movement pruning on the feed-forward layers in GPT-type models. Unexpectedly, random pruning results in performance that is comparable to the best established methods, across multiple natural language generation tasks. To understand these results, we provide a framework for measuring neuron-level redundancy of models pruned by different methods, and discover that established structured pruning methods do not take into account the distinctiveness of neurons, leaving behind excess redundancies. In view of this, we introduce Globally Unique Movement (GUM) to improve the uniqueness of neurons in pruned models. We then discuss the effects of our techniques on different redundancy metrics to explain the improved performance.
翻译:GPT-3等自动递减的大型语言模型需要大量计算资源才能使用。 传统上,使用结构化的裁剪方法来减少资源使用。 但是,它们对于基因化语言模型的应用和功效都严重不足。 在本文件中,我们全面评估了通用结构化裁剪方法,包括GPT型模型的进料向前层的大小、随机和运动。 意外地,在多种自然语言生成任务中,随机裁剪结果的性能可以与最佳的既定方法相比。 为了理解这些结果,我们提供了一个框架,用于测量由不同方法切割出来的模型的神经级冗余,并发现既定的结构裁剪方法没有考虑到神经元的独特性,留下过多的冗余。 有鉴于此,我们引入了全球独特运动(GUM),以提高经调整的模型中神经元的独特性。 然后我们讨论我们的技术对不同裁剪裁指标的效果,以解释改进的性能。