While social media offers freedom of self-expression, abusive language carry significant negative social impact. Driven by the importance of the issue, research in the automated detection of abusive language has witnessed growth and improvement. However, these detection models display a reliance on strongly indicative keywords, such as slurs and profanity. This means that they can falsely (1a) miss abuse without such keywords or (1b) flag non-abuse with such keywords, and that (2) they perform poorly on unseen data. Despite the recognition of these problems, gaps and inconsistencies remain in the literature. In this study, we analyse the impact of keywords from dataset construction to model behaviour in detail, with a focus on how models make mistakes on (1a) and (1b), and how (1a) and (1b) interact with (2). Through the analysis, we provide suggestions for future research to address all three problems.
翻译:尽管社交媒体提供了自我表达的自由,但滥用语言的社会影响很大。受这一问题重要性的驱使,自动检测滥用语言的研究有了增长和改进。然而,这些检测模型显示依赖强烈的指示性关键词,如斯卢尔斯和亵渎性。这意味着它们可能错误地(1a) 错过滥用而不使用这类关键词,或(1b) 悬挂不滥用这类关键词的旗帜,以及(2) 它们无法很好地利用看不见的数据。尽管认识到这些问题,但文献中仍然存在差距和不一致之处。在本研究中,我们详细分析数据集构建到示范行为的关键词的影响,重点是模型如何在(1a)和(1b)上出错,以及(1a)和(1b)与(2)互动。我们通过分析,为今后研究解决所有三个问题提出建议。