With the recent explosion in the size and complexity of source codebases and software projects, the need for efficient source code search engines has increased dramatically. Unfortunately, existing information retrieval-based methods fail to capture the query semantics and perform well only when the query contains syntax-based keywords. Consequently, such methods will perform poorly when given high-level natural language queries. In this paper, we review existing methods for building code search engines. We also outline the open research directions and the various obstacles that stand in the way of having a universal source code search engine.
翻译:随着最近源代码库和软件项目规模和复杂性的爆炸性,对高效源代码搜索引擎的需求急剧增加。 不幸的是,现有基于信息检索的方法无法捕捉查询语义学,只有在查询含有基于语法的关键词时才运行良好。因此,在给出高水平自然语言查询时,这些方法效果不佳。我们在本文件中审查现有的构建代码搜索引擎的方法。我们还概述了开放式研究方向和阻碍建立通用源代码搜索引擎的各种障碍。