Code search is a widely used technique by developers during software development. It provides semantically similar implementations from a large code corpus to developers based on their queries. Existing techniques leverage deep learning models to construct embedding representations for code snippets and queries, respectively. Features such as abstract syntactic trees, control flow graphs, etc., are commonly employed for representing the semantics of code snippets. However, the same structure of these features does not necessarily denote the same semantics of code snippets, and vice versa. In addition, these techniques utilize multiple different word mapping functions that map query words/code tokens to embedding representations. This causes diverged embeddings of the same word/token in queries and code snippets. We propose a novel context-aware code translation technique that translates code snippets into natural language descriptions (called translations). The code translation is conducted on machine instructions, where the context information is collected by simulating the execution of instructions. We further design a shared word mapping function using one single vocabulary for generating embeddings for both translations and queries. We evaluate the effectiveness of our technique, called TranCS, on the CodeSearchNet corpus with 1,000 queries. Experimental results show that TranCS significantly outperforms state-of-the-art techniques by 49.31% to 66.50% in terms of MRR (mean reciprocal rank).
翻译:代码搜索是开发者在软件开发过程中广泛使用的一种技术。 它为开发者根据他们的查询, 从一个大代码库到一个大代码库, 提供类似的执行。 现有技术利用深层学习模型, 分别为代码片断和查询构建嵌入演示。 抽象合成树、 控制流程图等特性通常用于代表代码片段的语义。 但是, 这些特性的相同结构并不一定表示代码片断的语义, 反之亦然。 此外, 这些技术还利用多种不同的字词映射功能, 用于绘制查询词/代码符号, 以嵌入演示。 这导致在查询和代码片断中嵌入相同的词/ 。 我们提出了一种全新的背景认知代码翻译技术, 将代码片断转换成自然语言描述( 所谓的翻译) 。 这些代码翻译是在机器指令上进行的, 通过模拟指令的执行收集背景信息。 我们进一步设计一个共享的词级映射功能, 使用一个单一的词汇来生成嵌入翻译和查询的词义。 这导致将相同的词嵌入在查询和代码片断中出现相同的词/ 。 我们评估技术的有效性, TraCSAR- trestal- sal- chreal 查询 要求 的系统 的 将结果显示SAR- sal- tral- tral- tral- tral- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- saltraction