APIs (Application Programming Interfaces) are reusable software libraries and are building blocks for modern rapid software development. Previous research shows that programmers frequently share and search for reviews of APIs on the mainstream software question and answer (Q&A) platforms like Stack Overflow, which motivates researchers to design tasks and approaches related to process API reviews automatically. Among these tasks, classifying API reviews into different aspects (e.g., performance or security), which is called the aspect-based API review classification, is of great importance. The current state-of-the-art (SOTA) solution to this task is based on the traditional machine learning algorithm. Inspired by the great success achieved by pre-trained models on many software engineering tasks, this study fine-tunes six pre-trained models for the aspect-based API review classification task and compares them with the current SOTA solution on an API review benchmark collected by Uddin et al. The investigated models include four models (BERT, RoBERTa, ALBERT and XLNet) that are pre-trained on natural languages, BERTOverflow that is pre-trained on text corpus extracted from posts on Stack Overflow, and CosSensBERT that is designed for handling imbalanced data. The results show that all the six fine-tuned models outperform the traditional machine learning-based tool. More specifically, the improvement on the F1-score ranges from 21.0% to 30.2%. We also find that BERTOverflow, a model pre-trained on the corpus from Stack Overflow, does not show better performance than BERT. The result also suggests that CosSensBERT also does not exhibit better performance than BERT in terms of F1, but it is still worthy of being considered as it achieves better performance on MCC and AUC.
翻译:应用编程界面是可再使用的软件库,是现代快速软件开发的构件。先前的研究显示,程序员经常分享和搜索对主流软件问答平台(如Stack Overplow)上API的审查。 Stack Overflow 等程序平台上API的审评非常成功,这促使研究人员自动设计与API审查有关的任务和方法。其中,将API审查分为不同方面(如性能或安全),称为基于方位的API审查分类,具有极大的重要性。目前对这项任务的先进技术(SOTA)解决方案基于传统的机器学习算法。程序员经常分享和搜索对ASTATA的审查, 程序员经常共享和搜索,程序员经常使用传统机器学习工具学习工具学习程序。在SBRFRFRFRMl流中,SBOFRFRFRFRMLMLMS 显示所有SBRBRMLSBSBRLA 的运行成绩分析结果,在SDRBRFRFRFRFROTFR 上也显示。