深化上下文学习以提升代码生成能力 (Towards Enhancing In-Context Learning for Code Generation)

In-context learning (ICL) with pre-trained language models (PTLMs) has shown great success in code generation. ICL does not require training. PTLMs take as the input a prompt consisting of a few requirement-code examples and a new requirement, and output a new program. However, existing studies simply reuse ICL techniques for natural language generation and ignore unique features of code generation. We refer to these studies as standard ICL. Inspired by observations of the human coding process, we propose a novel ICL approach for code generation named AceCoder. Compared to standard ICL, AceCoder has two novelties. (1) Example retrieval. It retrieves similar programs as examples and learns programming skills (e.g., algorithms, APIs) from them. (2) Guided Code Generation. It encourages PTLMs to output an intermediate preliminary (e.g., test cases, APIs) before generating programs. The preliminary can help PTLMs understand requirements and guide the next code generation. We apply AceCoder to six PTLMs (e.g., Codex) and evaluate it on three public benchmarks using the Pass@k. Results show that AceCoder can significantly improve the performance of PTLMs on code generation. (1) In terms of Pass@1, AceCoder outperforms standard ICL by up to 79.7% and fine-tuned models by up to 171%. (2) AceCoder is effective in PTLMs with different sizes (e.g., 1B to 175B) and different languages (e.g., Python, Java, and JavaScript). (3) We investigate multiple choices of the intermediate preliminary. (4) We manually evaluate generated programs in three aspects and prove the superiority of AceCoder. (5) Finally, we discuss some insights about ICL for practitioners.

翻译：在上下文学习（ICL）领域中，利用预训练语言模型（PTLMs）进行代码生成已经展现出了巨大的成功。ICL技术不需要经过繁琐的训练，PTLMs使用几个需求-代码示例和一个新需求作为输入，输出一个新的程序。但现有的研究仅仅将ICL技术用于自然语言生成，忽略了代码生成的独特特点，我们称之为标准ICL。受人类编码过程的启发，我们提出了一种名为AceCoder的全新ICL方法，旨在提高代码生成的准确性。相比于标准ICL通路，AceCoder有两个新的特点：（1）示例检索。它检索类似程序作为示例，并从中学习编程技能（例如算法，应用程序编程接口）。（2）有引导的代码生成。它鼓励PTLMs首先输出一个中间的初步内容（例如测试用例，APIs），然后再生成程序。初步内容可以帮助PTLMs理解需求，指导下一步代码生成。我们将AceCoder应用于6个PTLMs（例如Codex），并在使用Pass@k指标的三个公开基准上进行了评估。结果表明，AceCoder可以显著提高PTLMs的代码生成性能：（1）在Pass@1方面，AceCoder的表现比标准ICL高出多达79.7％，比微调模型高出多达171％。（2）对于不同大小的PTLMs（1B到175B）和不同语言（例如Python，Java和JavaScript），AceCoder都是有效的。（3）我们研究了中间初步内容的多种选择。（4）我们手动评估了生成的程序在三个方面的性能，并证明了AceCoder的优越性。（5）最后，我们讨论了ICL对于从业人员的一些见解。