关于软件工程数据集独立感应检测工具组合的实效经验研究 (An Empirical Study of the Effectiveness of an Ensemble of Stand-alone Sentiment Detection Tools for Software Engineering Datasets)

Sentiment analysis in software engineering (SE) has shown promise to analyze and support diverse development activities. We report the results of an empirical study that we conducted to determine the feasibility of developing an ensemble engine by combining the polarity labels of stand-alone SE-specific sentiment detectors. Our study has two phases. In the first phase, we pick five SE-specific sentiment detection tools from two recently published papers by Lin et al. [31, 32], who first reported negative results with standalone sentiment detectors and then proposed an improved SE-specific sentiment detector, POME [31]. We report the study results on 17,581 units (sentences/documents) coming from six currently available sentiment benchmarks for SE. We find that the existing tools can be complementary to each other in 85-95% of the cases, i.e., one is wrong, but another is right. However, a majority voting-based ensemble of those tools fails to improve the accuracy of sentiment detection. We develop Sentisead, a supervised tool by combining the polarity labels and bag of words as features. Sentisead improves the performance (F1-score) of the individual tools by 4% (over Senti4SD [5]) - 100% (over POME [31]). In a second phase, we compare and improve Sentisead infrastructure using Pre-trained Transformer Models (PTMs). We find that a Sentisead infrastructure with RoBERTa as the ensemble of the five stand-alone rule-based and shallow learning SE-specific tools from Lin et al. [31, 32] offers the best F1-score of 0.805 across the six datasets, while a stand-alone RoBERTa shows an F1-score of 0.801.

翻译：软件工程( SE) 的感官分析显示, 有希望分析和支持多种开发活动。我们报告了我们通过将独立SE特定感官检测器的极性标签结合到独立SE特定感官检测器的极性标签来确定开发混合引擎的可行性而进行的一项实证研究的结果。我们的研究分为两个阶段。在第一阶段,我们从Lin et al. [31, 32] 最近出版的两份论文中挑选了五种SE特有情绪检测工具。 [31, 32],他们首先用独立感知探测器报告了负面结果,然后提出了改进SE特定情绪检测器POM[31]。我们报告了17,581个单元(感知/文件)的研究结果。我们从SEE现有六个感知基准中报告了17,581个单元(感知/文件)的研究结果。我们发现现有工具可以互相补充,85-95%的情况是错的,但另一个是对的。然而,大多数基于投票的工具的电算无法提高情绪检测的准确性。我们从Sendisadal(我们从Sendialal) 找到一个Sendisadal) 和数包的极性标签, 一个Sildaladaladaladdaladaddrialdal 工具, 一个Sild view view view view views views views sild views view views views views views sild views vi vi vi views viewsild viewd vid vi viewd viewd vi vi viewd viewd vi viewd views vical_ vi vial vial vial vids vi vical_ vical_ vical_ silds vical_ vial_ vical_ lads ladal_ sildal_ sildal_ ladal_ ladal_ ladal_ sild_ sildal_ sildal_ ladal_ sil_ sild_

相关内容

TOOLS

关注 1

这个新版本的工具会议系列恢复了从1989年到2012年的50个会议的传统。工具最初是“面向对象语言和系统的技术”，后来发展到包括软件技术的所有创新方面。今天许多最重要的软件概念都是在这里首次引入的。2019年TOOLS 50+1在俄罗斯喀山附近举行，以同样的创新精神、对所有与软件相关的事物的热情、科学稳健性和行业适用性的结合以及欢迎该领域所有趋势和社区的开放态度，延续了该系列。官网链接：http://tools2019.innopolis.ru/