Representing code changes as numeric feature vectors, i.e., code change representations, is usually an essential step to automate many software engineering tasks related to code changes, e.g., commit message generation and just-in-time defect prediction. Intuitively, the quality of code change representations is crucial for the effectiveness of automated approaches. Prior work on code changes usually designs and evaluates code change representation approaches for a specific task, and little work has investigated code change encoders that can be used and jointly trained on various tasks. To fill this gap, this work proposes a novel Code Change Representation learning approach named CCRep, which can learn to encode code changes as feature vectors for diverse downstream tasks. Specifically, CCRep regards a code change as the combination of its before-change and after-change code, leverages a pre-trained code model to obtain high-quality contextual embeddings of code, and uses a novel mechanism named query back to extract and encode the changed code fragments and make them explicitly interact with the whole code change. To evaluate CCRep and demonstrate its applicability to diverse code-change-related tasks, we apply it to three tasks: commit message generation, patch correctness assessment, and just-in-time defect prediction. Experimental results show that CCRep outperforms the state-of-the-art techniques on each task.
翻译:以数字特性矢量代表代码变化, 即代码变化表示, 通常是一个必要步骤, 使与代码变化相关的许多软件工程任务自动化, 例如, 进行信息生成和即时缺陷预测。 直觉看, 代码变化表示的质量对于自动化方法的有效性至关重要 。 之前的代码修改工作通常设计并评估用于特定任务的代码变化代表方法, 几乎没有什么工作调查可以使用并联合培训用于各种任务的代码变化代号编码代号。 为了填补这一空白, 这项工作提出了名为 CCCREep 的新的代码变化代表学习方法, 可以学习将代码变化编码作为不同下游任务的特性矢量。 具体地说, CCRep 将代码变化的代码变化视为其变换前和变后代码组合的组合, 利用预先培训的代码模型获取高质量的代码背景嵌入器, 并使用名为查询器的新机制提取和编码修改后的代码碎片, 使其与整个代码变化明确互动。 为了评估 CCCC C C C C C 并显示其适用于不同的代码变化任务, 我们将其应用于每个变换代数的计算结果。