Online technical forums (e.g., StackOverflow) are popular platforms for developers to discuss technical problems such as how to use specific Application Programming Interface (API), how to solve the programming tasks, or how to fix bugs in their codes. These discussions can often provide auxiliary knowledge of how to use the software that is not covered by the official documents. The automatic extraction of such knowledge will support a set of downstream tasks like API searching or indexing. However, unlike official documentation written by experts, discussions in open forums are made by regular developers who write in short and informal texts, including spelling errors or abbreviations. There are three major challenges for the accurate APIs recognition and linking mentioned APIs from unstructured natural language documents to an entry in the API repository: (1) distinguishing API mentions from common words; (2) identifying API mentions without a fully qualified name; and (3) disambiguating API mentions with similar method names but in a different library. In this paper, to tackle these challenges, we propose an ARCLIN tool, which can effectively distinguish and link APIs without using human annotations. Specifically, we first design an API recognizer to automatically extract API mentions from natural language sentences by a Condition Random Field (CRF) on the top of a Bi-directional Long Short-Term Memory (Bi-LSTM) module, then we apply a context-aware scoring mechanism to compute the mention-entry similarity for each entry in an API repository. Compared to previous approaches with heuristic rules, our proposed tool without manual inspection outperforms by 8% in a high-quality dataset Py-mention, which contains 558 mentions and 2,830 sentences from five popular Python libraries.
翻译:在线技术论坛(如StackOverflow)是开发者讨论技术问题的流行平台,如如何使用特定的应用程序编程接口(API),如何解决编程任务,或如何修正代码中的错误。这些讨论往往能提供如何使用不在正式文件中涵盖的软件的辅助知识。自动提取这种知识将支持诸如API搜索或索引等一系列下游任务。然而,与专家编写的正式文件不同,公开论坛的讨论由定期开发者进行,这些开发者以短文和非正式文字写稿,包括拼写错误或缩写。准确的 API 识别和将所提到的API 与非结构化自然语言文件链接到 API 库的条目有三大挑战:(1) 将API 与通用词区分;(2) 识别API 提及没有完全限定的名称;(3) 模糊API 提及类似方法名称,但在不同的图书馆中,我们提议一个ARCLIN 工具,可以有效地区分和连接API,而不用人文说明。具体地,我们首次设计API-CR-CI 高级读取A 5号,然后将AFIM IM IMFILMal 上,然后用我们FIM-I-I-IAR 5号的服务器自动读取一个直判。