Automatic code review (ACR), aiming to relieve manual inspection costs, is an indispensable and essential task in software engineering. The existing works only use the source code fragments to predict the results, missing the exploitation of developer's comments. Thus, we present a Multi-Modal Apache Automatic Code Review dataset (MACR) for the Multi-Modal ACR task. The release of this dataset would push forward the research in this field. Based on it, we propose a Contrastive Learning based Multi-Modal Network (CLMN) to deal with the Multi-Modal ACR task. Concretely, our model consists of a code encoding module and a text encoding module. For each module, we use the dropout operation as minimal data augmentation. Then, the contrastive learning method is adopted to pre-train the module parameters. Finally, we combine the two encoders to fine-tune the CLMN to decide the results of Multi-Modal ACR. Experimental results on the MACR dataset illustrate that our proposed model outperforms the state-of-the-art methods.
翻译:自动代码审查(ACR)旨在降低人工检查成本,是软件工程中一项不可或缺的重要任务。现有的工程只使用源代码碎片来预测结果,而没有开发者的评论。 因此, 我们为多式ACR任务提出了一个多式阿帕奇自动代码审查数据集( MCR) 。 该数据集的发布将推动这一领域的研究。 基于此选项, 我们提议一个基于对比学习的多式网络( CLMN) 来处理多式ACR 任务。 具体地说, 我们的模型包括一个代码编码模块和一个文本编码模块。 对于每个模块, 我们使用退出操作作为最小的数据增量。 然后, 我们采用对比学习方法来预设模块参数。 最后, 我们合并两个编码来调整 CLMN, 以决定多式ACR的结果。 MACR 数据集的实验结果表明, 我们提议的模型超越了最先进的方法。