ZipLM: 语言模型的硬件-软件结构调节 (ZipLM: Hardware-Aware Structured Pruning of Language Models)

The breakthrough performance of large language models (LLMs) comes with large computational footprints and high deployment costs. In this paper, we progress towards resolving this problem by proposing a new structured compression approach for LLMs, called ZipLM, which provides state-of-the-art compression-vs-accuracy results, while guaranteeing to match a set of (achievable) target speedups on any given target hardware. Specifically, given a task, a model, an inference environment, as well as a set of speedup targets, ZipLM identifies and removes redundancies in the model through iterative structured shrinking of the model's weight matrices. Importantly, ZipLM works in both, the post-training/one-shot and the gradual compression setting, where it produces a set of accurate models in a single run, making it highly-efficient in practice. Our approach is based on new structured pruning and knowledge distillation techniques, and consistently outperforms prior structured compression methods in terms of accuracy-versus-speedup in experiments on BERT- and GPT-family models. In particular, when compressing GPT2 model, it outperforms DistilGPT2 while being 60% smaller and 30% faster. Further, ZipLM matches performance of heavily optimized MobileBERT model, obtained via extensive architecture search, by simply pruning the baseline BERT-large architecture, and outperforms all prior BERT-base compression techniques like CoFi, MiniLM and TinyBERT.

翻译：大型语言模型(LLMS)的突破性表现伴随着大量的计算足迹和高部署成本。在本文件中,我们通过提出一种称为ZipLM(ZipLM)的LLM(LLM)新的结构化压缩方法,为LLM(称为ZipLM)提出新的结构化压缩方法,提供最先进的压缩-Vs准确性结果,同时保证在任何特定目标硬件上匹配一套(可实现的)目标加速。具体地说,根据一项任务、一个模型、推论环境以及一套加速目标,ZipLM通过模型重量矩阵的迭代结构缩缩缩缩来查明并消除模型中的冗余。重要的是,ZipLM(ZipLM)在培训后/一发式压缩技术和逐步压缩设置两方面都同时提供一套精确模型,使其在任何特定目标硬件上高度高效。我们的方法基于新的结构化理算和知识蒸馏技术,在BERT和GPT-F-FM(G-PT-F-F-F-F)模型实验中的所有结构缩缩缩缩缩缩缩缩缩缩缩方法。具体地将G-FILM(B)的缩缩缩缩缩缩成的缩缩图,同时通过SFILB)的缩缩缩缩缩缩的缩的缩缩缩成的缩的缩缩图。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日