Social aspects of software projects become increasingly important for research and practice. Different approaches analyze the sentiment of a development team, ranging from simply asking the team to so-called sentiment analysis on text-based communication. These sentiment analysis tools are trained using pre-labeled data sets from different sources, including GitHub and Stack Overflow. In this paper, we investigate if the labels of the statements in the data sets coincide with the perception of potential members of a software project team. Based on an international survey, we compare the median perception of 94 participants with the pre-labeled data sets as well as every single participant's agreement with the predefined labels. Our results point to three remarkable findings: (1) Although the median values coincide with the predefined labels of the data sets in 62.5% of the cases, we observe a huge difference between the single participant's ratings and the labels; (2) there is not a single participant who totally agrees with the predefined labels; and (3) the data set whose labels are based on guidelines performs better than the ad hoc labeled data set.
翻译:软件项目的社会方面对于研究和实践越来越重要。 不同的方法分析开发团队的情绪,从简单的询问团队到所谓的基于文本的通信情绪分析。 这些情绪分析工具使用来自不同来源的预标签数据集进行培训,包括GitHub和Stack Overflow。 在本文中,我们调查数据集中报表的标签是否与软件项目团队潜在成员的看法相吻合。 根据一项国际调查,我们将94名参与者的中位值与预标签数据集以及每个参与者与预先定义的标签之间的协议进行比较。我们的结果指出三个显著的结论:(1) 虽然中位值与62.5%案例的数据集预先定义的标签相吻合,但我们观察到单个参与者的评级和标签之间存在巨大差异;(2) 没有一位参与者完全同意预先定义的标签;(3) 以准则为基础的数据集比临时标签数据集要好。