Semantic code search technology allows searching for existing code snippets through natural language, which can greatly improve programming efficiency. Smart contracts, programs that run on the blockchain, have a code reuse rate of more than 90%, which means developers have a great demand for semantic code search tools. However, the existing code search models still have a semantic gap between code and query, and perform poorly on specialized queries of smart contracts. In this paper, we propose a Multi-Modal Smart contract Code Search (MM-SCS) model. Specifically, we construct a Contract Elements Dependency Graph (CEDG) for MM-SCS as an additional modality to capture the data-flow and control-flow information of the code. To make the model more focused on the key contextual information, we use a multi-head attention network to generate embeddings for code features. In addition, we use a fine-tuned pretrained model to ensure the model's effectiveness when the training data is small. We compared MM-SCS with four state-of-the-art models on a dataset with 470K (code, docstring) pairs collected from Github and Etherscan. Experimental results show that MM-SCS achieves an MRR (Mean Reciprocal Rank) of 0.572, outperforming four state-of-the-art models UNIF, DeepCS, CARLCS-CNN, and TAB-CS by 34.2%, 59.3%, 36.8%, and 14.1%, respectively. Additionally, the search speed of MM-SCS is second only to UNIF, reaching 0.34s/query.
翻译:语义代码搜索技术允许通过自然语言搜索现有的代码片断,这可以大大提高编程效率。智能合同,在块链上运行的程序,其代码再利用率超过90%,这意味着开发者对语义代码搜索工具的需求很大。然而,现有的代码搜索模型在代码和查询之间仍然存在着语义差距,在智能合同的专门查询方面表现不佳。在本文中,我们提议了一个多模式智能合同代码搜索模式(MM-SSCS),具体地说,我们为MM-SCS建立了一个合同要素依赖性图(CEDG),作为捕获代码数据流和控制流信息的附加方式。为了使模型更加侧重于关键背景信息,我们使用多点关注网络生成代码特征的嵌入。此外,我们使用一个经过精细调整的预先培训模型,以确保模型在培训数据小时的有效性。我们用MM-SSCS与四个州级的CSFS-l3级模型进行了比较。我们用470K(代码, docretaminal AS-RAFS) 和四个州级的IMS-CS-CS-CS-CS-CS-CS-CS-CS-CFM-CS-CS-CS-CS-CFM-CFM-CRB-CS-CFM-CFM-CFM-CFM-CS-CS-CM-CR-CR-CR-CR-CM-CM-CR-CR-CFM-CR-CR-CR-CR-CFM-CM-CR-CR-CR-CR-CR-CM-CM-CM-R-C-C-CR-CR-CM-CM-CM-CR-CR-CM-CR-CR-CR-CR-CR-CFMDRBAR-CR-CR-CR-CR-CR-CR-CR-CR-CR-CR-CR-CR-CR-CR-CR-CR-CR-CR-CR-CR-C-C-C-C-C