Understanding toxicity in user conversations is undoubtedly an important problem. Addressing "covert" or implicit cases of toxicity is particularly hard and requires context. Very few previous studies have analysed the influence of conversational context in human perception or in automated detection models. We dive deeper into both these directions. We start by analysing existing contextual datasets and come to the conclusion that toxicity labelling by humans is in general influenced by the conversational structure, polarity and topic of the context. We then propose to bring these findings into computational detection models by introducing and evaluating (a) neural architectures for contextual toxicity detection that are aware of the conversational structure, and (b) data augmentation strategies that can help model contextual toxicity detection. Our results have shown the encouraging potential of neural architectures that are aware of the conversation structure. We have also demonstrated that such models can benefit from synthetic data, especially in the social media domain.
翻译:在用户谈话中了解毒性无疑是一个重要问题。处理“隐蔽”或隐含的毒性案例特别困难,需要上下文。以前的研究很少分析在人类感知或自动检测模型中谈话环境的影响。我们深潜于这两个方向。我们从分析现有背景数据集开始,得出的结论是,人类的毒性标签一般受谈话结构、极分性和相关主题的影响。我们然后提议将这些结果纳入计算检测模型,方法是引入和评价(a) 了解谈话结构的环境毒性检测神经结构,(b) 有助于模拟环境毒性检测的数据增强战略。我们的结果显示,了解谈话结构的神经结构具有令人鼓舞的潜力。我们还表明,合成数据,特别是社交媒体领域的数据,可以使这些模型受益。