构建针对代码智能任务的有效上下文演示：一项实证研究 (Constructing Effective In-Context Demonstration for Code Intelligence Tasks: An Empirical Study)

Pre-trained models of code have gained widespread popularity in many code intelligence tasks. Recently, with the scaling of the model and corpus size, large language models have shown the ability of in-context learning. These models employ task instructions and a few demonstration examples as prompts to learn the semantics of the task and make predictions for test samples. This new learning paradigm is training-free and has shown impressive performance in various natural language processing and code intelligence tasks. However, the performance of in-context learning heavily relies on the quality of demonstration, and there has been no systematic investigation into how to construct a good demonstration for code-related tasks with in-context learning. In this paper, by analyzing the design space of in-context demonstration, we empirically explore the impact of three key factors on the performance of in-context learning in code intelligence tasks: the selection of demonstration examples, the order of demonstration examples, and the number of demonstration examples. We conduct extensive experiments on three code intelligence tasks including bug fixing, code summarization, and program synthesis. Our experimental results demonstrate that all the above three factors dramatically impact the performance of in-context learning in code intelligence tasks. Additionally, we summarize our findings and provide takeaway suggestions on how to construct effective demonstrations, taking into account these three perspectives. We show that a well-constructed demonstration can lead to significant improvements over simple demonstrations and previous fine-tuned state-of-the-art models, e.g., improving EM, BLEU-4, and EM by at least 11.91%, 36.88%, and 37.18% on code summarization, bug fixing and program synthesis, respectively.

翻译：预训练的代码模型在许多代码智能任务中已经广泛应用。最近，随着模型和语料库规模的扩大，大型语言模型展示了在上下文学习方面的能力。这些模型利用任务说明和几个演示示例作为提示来学习任务的语义，并为测试样本做出预测。这种新的学习范式是无需训练的，并且在各种自然语言处理和代码智能任务中展现了惊人的性能。然而，上下文学习的性能非常依赖演示的质量，并且尚未对如何针对具有上下文学习的代码相关任务构建好的演示进行系统调查。本文通过分析上下文演示的设计空间，经验性地探讨了三个关键因素对代码智能任务中上下文学习性能的影响：演示示例的选择、演示示例的顺序和演示示例的数量。我们在三项代码智能任务包括错误修复、代码汇总和程序合成上进行了广泛实验。我们的实验结果表明，以上三个因素都会对代码智能任务中上下文学习的性能产生巨大影响。此外，我们总结了发现并提供了关于如何构建有效演示的建议，考虑到这三个方面。我们表明，一份良好的演示可以相较于简单演示和之前微调的最新模型显著提高表现，例如在代码汇总、错误修复和程序合成方面，至少会提高11.91％，36.88％和37.18％的EM，BLEU-4和EM。