评估《代码克隆检测法》检测的很少的射击和矛盾学习方法 (Evaluating few shot and Contrastive learning Methods for Code Clone Detection)

Context: Code Clone Detection (CCD) is a software engineering task that is used for plagiarism detection, code search, and code comprehension. Recently, deep learning-based models have achieved an F1 score (a metric used to assess classifiers) of $\sim$95\% on the CodeXGLUE benchmark. These models require many training data, mainly fine-tuned on Java or C++ datasets. However, no previous study evaluates the generalizability of these models where a limited amount of annotated data is available. Objective: The main objective of this research is to assess the ability of the CCD models as well as few shot learning algorithms for unseen programming problems and new languages (i.e., the model is not trained on these problems/languages). Method: We assess the generalizability of the state of the art models for CCD in few shot settings (i.e., only a few samples are available for fine-tuning) by setting three scenarios: i) unseen problems, ii) unseen languages, iii) combination of new languages and new problems. We choose three datasets of BigCloneBench, POJ-104, and CodeNet and Java, C++, and Ruby languages. Then, we employ Model Agnostic Meta-learning (MAML), where the model learns a meta-learner capable of extracting transferable knowledge from the train set; so that the model can be fine-tuned using a few samples. Finally, we combine contrastive learning with MAML to further study whether it can improve the results of MAML.

翻译：背景:代码克隆检测(CCD)是一项软件工程任务,用于进行斑象检测、代码搜索和代码理解。最近,深层次的学习模型在代码XGLUE基准基准上取得了F1分(用于评估分类的尺度)$sim$95 ⁇ 。这些模型需要许多培训数据,主要是在爪哇或C+++数据集上进行微调。然而,以往没有一项研究评估这些模型在可获得附加说明的数据数量有限的情况下的通用性。目标:本研究的主要目标是评估CCD模型的能力,以及很少为隐蔽的编程问题和新语言(即,该模型没有就这些问题/语言进行培训)。方法:我们评估CCD在几处拍摄环境中的艺术模型的可概括性(即,只有少量的样本可以进行微调),方法是设定三种模式:(一) 看不见的问题,二) 看不见的语言,三) 新的语言和新的问题组合。我们选择了三种BICMET模型(即MMANet的精细数据集, MAMLA-MLMLA的学习结果,我们使用MLA-MLA的精选的模型学习。

相关内容

MAML

关注 42

MAML（Model-Agnostic Meta-Learning）是元学习（Meta learning）最经典的几个算法之一，出自论文《Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks》。原文地址：https://arxiv.org/abs/1703.03400

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

专知会员服务

16+阅读 · 2022年3月13日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日