Software Engineering (SE) communities such as Stack Overflow have become unwelcoming, particularly through members' use of offensive language. Research has shown that offensive language drives users away from active engagement within these platforms. This work aims to explore this issue more broadly by investigating the nature of offensive language in comments posted by users in four prominent SE platforms - GitHub, Gitter, Slack and Stack Overflow (SO). It proposes an approach to detect and classify offensive language in SE communities by adopting natural language processing and deep learning techniques. Further, a Conflict Reduction System (CRS), which identifies offence and then suggests what changes could be made to minimize offence has been proposed. Beyond showing the prevalence of offensive language in over 1 million comments from four different communities which ranges from 0.07% to 0.43%, our results show promise in successful detection and classification of such language. The CRS system has the potential to drastically reduce manual moderation efforts to detect and reduce offence in SE communities.
翻译:Stack overflow (SE) 等软件工程(SE) 群落变得不妥,特别是通过成员使用攻击性语言。研究表明,攻击性语言驱使用户远离这些平台内的积极参与。这项工作的目的是更广泛地探讨这一问题,调查SE四个重要平台 -- -- GitHub、Gitter、Slack和Stack overflow (SO) -- -- 用户在四个主要平台 -- -- GitHub、Gitter、Slack和Stack overflow (SE) 上的评论中发表的攻击性语言的性质。它建议采用一种方法,通过自然语言处理和深层学习技术,在SE社区中发现和分类攻击性语言。此外,还提出了冲突减少系统,该系统查明了犯罪,然后提出了为尽量减少犯罪可以作哪些修改的建议。除了从0.07%到0.43%不等的四个不同社区在100多万次评论中显示攻击性语言的流行性之外,我们的结果显示成功发现和分类这种语言的可能性。CRS系统有可能大大减少在SEE社区中人工调节和减少犯罪。