使用毒性毒性控制系统自动识别毒物编码审查 (Automated Identification of Toxic Code Reviews Using ToxiCR)

Toxic conversations during software development interactions may have serious repercussions on a Free and Open Source Software (FOSS) development project. For example, victims of toxic conversations may become afraid to express themselves, therefore get demotivated, and may eventually leave the project. Automated filtering of toxic conversations may help a FOSS community to maintain healthy interactions among its members. However, off-the-shelf toxicity detectors perform poorly on Software Engineering (SE) dataset, such as one curated from code review comments. To encounter this challenge, we present ToxiCR, a supervised learning-based toxicity identification tool for code review interactions. ToxiCR includes a choice to select one of the ten supervised learning algorithms, an option to select text vectorization techniques, eight preprocessing steps, and a large scale labeled dataset of 19,571 code review comments. Two out of those eight preprocessing steps are SE domain specific. With our rigorous evaluation of the models with various combinations of preprocessing steps and vectorization techniques, we have identified the best combination for our dataset that boosts 95.8% accuracy and 88.9% F1 score. ToxiCR significantly outperforms existing toxicity detectors on our dataset. We have released our dataset, pretrained models, evaluation results, and source code publicly available at: https://github.com/WSU-SEAL/ToxiCR

翻译：软件开发过程中的有毒对话可能对软件开发过程中的自由和开放源码软件开发项目产生严重影响。例如,有毒对话的受害者可能害怕表达自己,因此可能失去动力,最终可能离开该项目。有毒对话的自动过滤可能帮助自由和开放源码软件社区保持其成员之间的健康互动。然而,软件工程数据集(如从代码审查评论中整理出来的一个代码审查评论)在现成的毒性检测器方面表现不佳。为了应对这一挑战,我们提出了托西CR,这是一个受监管的基于学习的毒性识别工具,用于代码审查互动。托西CR包括选择10种受监督的学习算法中的一个,选择文本传导技术、8个预处理步骤以及19 571个代码审查大比例标签数据集。这8个预处理步骤中有2个是SE域特有的。我们用各种预处理步骤和病媒化技术组合对模型进行了严格的评估,我们确定了我们的数据集的最佳组合,提高了95.8%的精确度和88.9%的F1分数。托西CR模型明显超越了我们数据源上现有的毒性检测结果。我们发布了AS-SU/AS前的SU 。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日