ML-powered code generation aims to assist developers to write code in a more productive manner, by intelligently generating code blocks based on natural language prompts. Recently, large pretrained deep learning models have substantially pushed the boundary of code generation and achieved impressive performance. Despite their great power, the huge number of model parameters poses a significant threat to adapting them in a regular software development environment, where a developer might use a standard laptop or mid-size server to develop her code. Such large models incur significant resource usage (in terms of memory, latency, and dollars) as well as carbon footprint. Model compression is a promising approach to address these challenges. Several techniques are proposed to compress large pretrained models typically used for vision or textual data. Out of many available compression techniques, we identified that quantization is mostly applicable for code generation task as it does not require significant retraining cost. As quantization represents model parameters with lower-bit integer (e.g., int8), the model size and runtime latency would both benefit from such int representation. We extensively study the impact of quantized model on code generation tasks across different dimension: (i) resource usage and carbon footprint, (ii) accuracy, and (iii) robustness. To this end, through systematic experiments we find a recipe of quantization technique that could run even a $6$B model in a regular laptop without significant accuracy or robustness degradation. We further found the recipe is readily applicable to code summarization task as well.
翻译:以ML为动力的代码生成旨在协助开发商以更有成效的方式,通过智能生成基于自然语言提示的代码区块,协助开发商以更有成效的方式编写代码。最近,大量事先受过训练的深层次学习模型大大拉动了代码生成的界限,并取得了令人印象深刻的性能。尽管其功率巨大,但大量模型参数对在常规软件开发环境中调整代码构成了重大威胁,因为开发商可能使用标准的膝上型计算机或中型服务器来开发其代码。这些大型模型需要大量使用资源(记忆、延缓度和美元方面)以及碳足迹。模型压缩是应对这些挑战的一个大有希望的方法。我们提出了几种技术来压缩通常用于愿景或文本数据的大规模预先培训型模型。在许多现有的压缩技术中,我们发现四分化主要适用于代码生成任务,因为不需要大量的再培训成本。由于四分解是低比整(例如,8)模型大小和运行时的惯性坚固度将获益于这种直观。我们广泛研究了四分模型生成模型对不同层面的准确性任务的影响。(例如,我们找到了一个稳定的资源使用和最终的精确度,最终的精确度,一个方法)。</s>