绿色但又强大:用量化来保护大型代码生成模型</s> (Greener yet Powerful: Taming Large Code Generation Models with Quantization)

Xiaokai Wei,Sujan Gonugondla,Wasi Ahmad,Shiqi Wang,Baishakhi Ray,Haifeng Qian,Xiaopeng Li,Varun Kumar,Zijian Wang,Yuchen Tian,Qing Sun,Ben Athiwaratkun,Mingyue Shang,Murali Krishna Ramanathan,Parminder Bhatia,Bing Xiang

from arxiv, 10 pages, 7 figures, 10 tables

ML-powered code generation aims to assist developers to write code in a more productive manner, by intelligently generating code blocks based on natural language prompts. Recently, large pretrained deep learning models have substantially pushed the boundary of code generation and achieved impressive performance. Despite their great power, the huge number of model parameters poses a significant threat to adapting them in a regular software development environment, where a developer might use a standard laptop or mid-size server to develop her code. Such large models incur significant resource usage (in terms of memory, latency, and dollars) as well as carbon footprint. Model compression is a promising approach to address these challenges. Several techniques are proposed to compress large pretrained models typically used for vision or textual data. Out of many available compression techniques, we identified that quantization is mostly applicable for code generation task as it does not require significant retraining cost. As quantization represents model parameters with lower-bit integer (e.g., int8), the model size and runtime latency would both benefit from such int representation. We extensively study the impact of quantized model on code generation tasks across different dimension: (i) resource usage and carbon footprint, (ii) accuracy, and (iii) robustness. To this end, through systematic experiments we find a recipe of quantization technique that could run even a $6$B model in a regular laptop without significant accuracy or robustness degradation. We further found the recipe is readily applicable to code summarization task as well.

翻译：以ML为动力的代码生成旨在协助开发商以更有成效的方式,通过智能生成基于自然语言提示的代码区块,协助开发商以更有成效的方式编写代码。最近,大量事先受过训练的深层次学习模型大大拉动了代码生成的界限,并取得了令人印象深刻的性能。尽管其功率巨大,但大量模型参数对在常规软件开发环境中调整代码构成了重大威胁,因为开发商可能使用标准的膝上型计算机或中型服务器来开发其代码。这些大型模型需要大量使用资源(记忆、延缓度和美元方面)以及碳足迹。模型压缩是应对这些挑战的一个大有希望的方法。我们提出了几种技术来压缩通常用于愿景或文本数据的大规模预先培训型模型。在许多现有的压缩技术中,我们发现四分化主要适用于代码生成任务,因为不需要大量的再培训成本。由于四分解是低比整(例如,8)模型大小和运行时的惯性坚固度将获益于这种直观。我们广泛研究了四分模型生成模型对不同层面的准确性任务的影响。(例如,我们找到了一个稳定的资源使用和最终的精确度,最终的精确度,一个方法)。</s>

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日