Content warning: This work displays examples of explicit and/or strongly offensive language. Fueled by a surge of anti-Asian xenophobia and prejudice during the COVID-19 pandemic, many have taken to social media to express these negative sentiments. Identifying these posts is crucial for moderation and understanding the nature of hate in online spaces. In this paper, we create and annotate a corpus of tweets to explore anti-Asian hate speech with a finer level of granularity. Our analysis reveals that this emergent form of hate speech often eludes established approaches. To address this challenge, we develop a model and an accompanied efficient training regimen that incorporates agreement between annotators. Our approach produces up to 8.8% improvement in macro F1 scores over a strong established baseline, indicating its effectiveness even in settings where consensus among annotators is low. We demonstrate that we are able to identify hate speech that is systematically missed by established hate speech detectors.
翻译:内容警告: 这项工作展示了明确和(或)强烈攻击性语言的例子。 在COVID-19大流行期间,由于反亚洲仇外心理和偏见的激增,许多人已经进入社交媒体来表达这些负面情绪。 确定这些职位对于在在线空间的温和和理解仇恨性质至关重要。 在本文中,我们创建并批注了一套推文,以更细的颗粒度来探索反亚洲仇恨言论。 我们的分析显示,这种新出现的仇恨言论形式往往无法形成一些办法。 为了应对这一挑战,我们开发了一个模式和配套的有效培训制度,将警告者之间的协议纳入其中。 我们的方法使宏观F1改进了8.8%,超过一个牢固的既定基线,表明即使在警告者之间共识低的情况下,其效果也很高。 我们证明,我们能够发现已经建立的仇恨言论探测器系统忽略了的仇恨言论。