*Content warning: This work displays examples of explicit and strongly offensive language. The COVID-19 pandemic has fueled a surge in anti-Asian xenophobia and prejudice. Many have taken to social media to express these negative sentiments, necessitating the development of reliable systems to detect hate speech against this often under-represented demographic. In this paper, we create and annotate a corpus of Twitter tweets using 2 experimental approaches to explore anti-Asian abusive and hate speech at finer granularity. Using the dataset with less biased annotation, we deploy multiple models and also examine the applicability of other relevant corpora to accomplish these multi-task classifications. In addition to demonstrating promising results, our experiments offer insights into the nuances of cultural and logistical factors in annotating hate speech for different demographics. Our analyses together aim to contribute to the understanding of the area of hate speech detection, particularly towards low-resource groups.
翻译:* 直接警告:这项工作展示了明确和强烈攻击性语言的例子。COVID-19大流行刺激了反亚洲仇外心理和偏见的激增。许多人已接受社交媒体来表达这些负面情绪,从而需要发展可靠的系统来检测针对这一经常代表性不足的人口的仇恨言论。在本文中,我们利用两种实验方法创建和批注一系列推特推文,探索在微粒颗粒上反亚洲虐待和仇恨的言论。我们运用了多种模型,还研究了其他相关公司在完成这些多任务分类方面的适用性。除了展示有希望的结果外,我们的实验还揭示了文化和后勤因素在为不同人口群体发出仇恨言论时的微妙之处。我们的分析共同的目的是帮助人们了解仇恨言论的发现领域,特别是针对低资源群体。