Identifying feature requests and bug reports in user comments holds great potential for development teams. However, automated mining of RE-related information from social media and app stores is challenging since (1) about 70% of user comments contain noisy, irrelevant information, (2) the amount of user comments grows daily making manual analysis unfeasible, and (3) user comments are written in different languages. Existing approaches build on traditional machine learning (ML) and deep learning (DL), but fail to detect feature requests and bug reports with high Recall and acceptable Precision which is necessary for this task. In this paper, we investigate the potential of transfer learning (TL) for the classification of user comments. Specifically, we train both monolingual and multilingual BERT models and compare the performance with state-of-the-art methods. We found that monolingual BERT models outperform existing baseline methods in the classification of English App Reviews as well as English and Italian Tweets. However, we also observed that the application of heavyweight TL models does not necessarily lead to better performance. In fact, our multilingual BERT models perform worse than traditional ML methods.
翻译:然而,从社交媒体和应用程序存储处自动挖掘与RE有关的信息具有挑战性,因为(1) 大约70%的用户评论含有吵闹、无关的信息,(2) 用户评论的数量每天都使人工分析变得不可行,(3) 用户评论以不同语言撰写,现有方法建立在传统的机器学习和深层次学习(DL)的基础上,但未能发现为这项任务所必需的高调和可接受的精密性能要求和错误报告。我们在本报告中调查了用户评论分类的传输学习(TL)的潜力。具体地说,我们培训单语和多语言的BERT模式,并将业绩与最新方法进行比较。我们发现单语的BERT模型比英语应用审查分类以及英语和意大利语Tweets中的现有基线方法要差。但我们也发现,使用重量轻的TL模型不一定导致更好的业绩。事实上,我们多语言的BERT模型比传统的ML方法要差。