With the widespread use of toxic language online, platforms are increasingly using automated systems that leverage advances in natural language processing to automatically flag and remove toxic comments. However, most automated systems---when detecting and moderating toxic language---do not provide feedback to their users, let alone provide an avenue of recourse for these users to make actionable changes. We present our work, RECAST, an interactive, open-sourced web tool for visualizing these models' toxic predictions, while providing alternative suggestions for flagged toxic language. Our work also provides users with a new path of recourse when using these automated moderation tools. RECAST highlights text responsible for classifying toxicity, and allows users to interactively substitute potentially toxic phrases with neutral alternatives. We examined the effect of RECAST via two large-scale user evaluations, and found that RECAST was highly effective at helping users reduce toxicity as detected through the model. Users also gained a stronger understanding of the underlying toxicity criterion used by black-box models, enabling transparency and recourse. In addition, we found that when users focus on optimizing language for these models instead of their own judgement (which is the implied incentive and goal of deploying automated models), these models cease to be effective classifiers of toxicity compared to human annotations. This opens a discussion for how toxicity detection models work and should work, and their effect on the future of online discourse.
翻译:由于在网上广泛使用有毒语言,平台正在越来越多地使用自动化系统,利用自然语言处理的进展,自动悬挂自动旗帜,删除有毒评论;然而,大多数自动系统 -- -- 检测和调节有毒语言 -- -- 不向用户提供反馈,更不用说为这些用户提供进行可采取行动的改变的渠道。我们介绍了我们的工作,即REECAST,一个互动的、开放来源的网络工具,用于直观这些模型的毒性预测,同时为标记有毒语言提供了替代建议。我们的工作还为用户提供了使用这些自动调适工具时的新的追索途径。REAST着重介绍了负责毒性分类的文本,并允许用户以中性替代潜在有毒的词语。我们研究了RECAST通过两个大规模用户评估对用户的影响,发现REECAST非常有效地帮助用户降低通过模型检测到的毒性。用户们还更深入地了解黑箱模型所使用的基本毒性标准,从而能够实现透明度和追索。此外,我们发现当用户在使用这些模型时侧重于优化其语言,而不是他们自己的判断力(这是关于使用自动检测模型的隐含的动力和目的)时,这些模型应该停止将自动检测模型的模拟。