While having options could be liberating, too many options could lead to the sub-optimal solution being chosen. This is not an exception in the software engineering domain. Nowadays, API has become imperative in making software developers' life easier. APIs help developers implement a function faster and more efficiently. However, given the large number of open-source libraries to choose from, choosing the right APIs is not a simple task. Previous studies on API recommendation leverage natural language (query) to identify which API would be suitable for the given task. However, these studies only consider one source of input, i.e., GitHub or Stack Overflow, independently. There are no existing approaches that utilize Stack Overflow to help generate better API sequence recommendations from queries obtained from GitHub. Therefore, in this study, we aim to provide a framework that could improve the result of the API sequence recommendation by leveraging information from Stack Overflow. In this work, we propose PICASO, which leverages a bi-encoder to do contrastive learning and a cross-encoder to build a classification model in order to find a semantically similar Stack Overflow post given an annotation (i.e., code comment). Subsequently, PICASO then uses the Stack Overflow's title as a query expansion. PICASO then uses the extended queries to fine-tune a CodeBERT, resulting in an API sequence generation model. Based on our experiments, we found that incorporating the Stack Overflow information into CodeBERT would improve the performance of API sequence generation's BLEU-4 score by 10.8%.
翻译:虽然有多种选择可能会带来自由,但太多选项可能会导致选择次优解。在软件工程领域也不例外。如今,API已成为软件开发人员更轻松的生活中不可或缺的部分。API帮助开发人员更快,更高效地实现功能。然而,鉴于开源库的数量巨大,选择正确的API并不是一个简单的任务。以前的API推荐研究利用自然语言(查询)来确定哪个API适合给定任务。 但是,这些研究仅独立地考虑了一个输入源,即GitHub或Stack Overflow。没有现有的方法利用Stack Overflow来帮助从GitHub获得的查询生成更好的API序列推荐结果。因此,在这项研究中,我们旨在提供一个框架,通过利用来自Stack Overflow的信息来提高API序列推荐的结果。在这项工作中,我们提出了PICASO,它利用双编码器来进行对比学习,并使用交叉编码器构建分类模型,以查找给定注释(即代码注释)的语义相似的Stack Overflow帖子。接下来,PICASO将使用Stack Overflow的标题作为查询扩展。随后,PICASO利用扩展查询来微调CodeBERT,从而得到一个API序列生成模型。基于我们的实验,我们发现将Stack Overflow信息纳入CodeBERT可以将API序列生成的BLEU-4分数的性能提高10.8%。