The code clone detection method based on semantic similarity has important value in software engineering tasks (e.g., software evolution, software reuse). Traditional code clone detection technologies pay more attention to the similarity of code at the syntax level, and less attention to the semantic similarity of the code. As a result, candidate codes similar in semantics are ignored. To address this issue, we propose a code clone detection method based on semantic similarity. By treating code as a series of interdependent events that occur continuously, we design a model namely EDAM to encode code semantic information based on event embedding and event dependency. The EDAM model uses the event embedding method to model the execution characteristics of program statements and the data dependence information between all statements. In this way, we can embed the program semantic information into a vector and use the vector to detect codes similar in semantics. Experimental results show that the performance of our EDAM model is superior to state of-the-art open source models for code clone detection.
翻译:以语义相似性为基础的代码克隆检测方法在软件工程任务(如软件演化、软件再利用)中具有重要价值; 传统代码克隆检测技术更加注意语法层面代码的相似性,而较少注意代码的语义相似性; 因此,在语义相似性中,候选代码被忽略; 为了解决这一问题, 我们提出了一个基于语义相似性的代码检测方法。 通过将代码作为一系列不断发生的相互依存事件处理, 我们设计了一个模型, 即 EDAM 来根据事件嵌入和事件依赖性来编码语义信息。 EDAM 模型使用事件嵌入方法来模拟程序语句的执行特性和所有语句之间数据依赖性信息。 这样, 我们可以将程序语义信息嵌入向矢量中, 并使用矢量来检测语义相似的代码。 实验结果表明, 我们的EDAM 模型的性能优于用于代码克隆检测的开放源模型。