Application Programming Interfaces (APIs), which encapsulate the implementation of specific functions as interfaces, greatly improve the efficiency of modern software development. As numbers of APIs spring up nowadays, developers can hardly be familiar with all the APIs, and usually need to search for appropriate APIs for usage. So lots of efforts have been devoted to improving the API recommendation task. However, it has been increasingly difficult to gauge the performance of new models due to the lack of a uniform definition of the task and a standardized benchmark. For example, some studies regard the task as a code completion problem; while others recommend relative APIs given natural language queries. To reduce the challenges and better facilitate future research, in this paper, we revisit the API recommendation task and aim at benchmarking the approaches. Specifically, the paper groups the approaches into two categories according to the task definition, i.e., query-based API recommendation and code-based API recommendation. We study 11 recently-proposed approaches along with 4 widely-used IDEs. One benchmark named as APIBench is then built for the two respective categories of approaches. Based on APIBench, we distill some actionable insights and challenges for API recommendation. We also achieve some implications and directions for improving the performance of recommending APIs, including data source selection, appropriate query reformulation, low resource setting, and cross-domain adaptation.
翻译:应用程序设计接口(API)囊括了具体功能作为界面的落实,大大提高了现代软件开发的效率。随着目前出现的API数量,开发商几乎不可能熟悉所有API,通常需要寻找适当的API使用。因此,已经为改进API的建议任务作出了大量努力。然而,由于对任务缺乏统一的定义和标准化基准,因此越来越难以衡量新模型的绩效。例如,一些研究将任务视为代码完成问题;而另一些研究则建议相对的API进行自然语言查询。为了减少挑战并更好地便利未来的研究,我们在本文件中重新审视API的建议任务,并力求确定方法的基准。具体地说,纸面将方法按照任务定义分为两类,即基于查询的API建议和基于代码的API建议。我们研究了11个最近提出的方法和4个广泛使用的IDE建议。一个称为APIBench的基准,然后为两种不同的方法类别建立了基准。根据API建议,根据ABench,我们还根据API建议,提出了一些可操作性的建议,包括:改进对API的适应性评估,我们提出了一些可评估性的建议。