谁的意见重要? 查明虐待性语言探测中仇恨言论受害者意见的有远见的模型 (Whose Opinions Matter? Perspective-aware Models to Identify Opinions of Hate Speech Victims in Abusive Language Detection)

Social media platforms provide users the freedom of expression and a medium to exchange information and express diverse opinions. Unfortunately, this has also resulted in the growth of abusive content with the purpose of discriminating people and targeting the most vulnerable communities such as immigrants, LGBT, Muslims, Jews and women. Because abusive language is subjective in nature, there might be highly polarizing topics or events involved in the annotation of abusive contents such as hate speech (HS). Therefore, we need novel approaches to model conflicting perspectives and opinions coming from people with different personal and demographic backgrounds. In this paper, we present an in-depth study to model polarized opinions coming from different communities under the hypothesis that similar characteristics (ethnicity, social background, culture etc.) can influence the perspectives of annotators on a certain phenomenon. We believe that by relying on this information, we can divide the annotators into groups sharing similar perspectives. We can create separate gold standards, one for each group, to train state-of-the-art deep learning models. We can employ an ensemble approach to combine the perspective-aware classifiers from different groups to an inclusive model. We also propose a novel resource, a multi-perspective English language dataset annotated according to different sub-categories relevant for characterising online abuse: hate speech, aggressiveness, offensiveness and stereotype. By training state-of-the-art deep learning models on this novel resource, we show how our approach improves the prediction performance of a state-of-the-art supervised classifier.

翻译：社会媒体平台为用户提供了言论自由以及交流信息和表达不同观点的媒体。不幸的是,这也导致滥用内容的增加,目的是歧视人们和针对移民、男女同性恋、双性恋、穆斯林、犹太人和妇女等最脆弱群体。由于滥用语言是主观性的,因此在描述仇恨言论等滥用内容时,可能会有高度两极化的议题或事件。因此,我们需要用新颖的方法来模拟来自不同个人和人口背景的人的相互冲突的观点和意见。在本文中,我们提出深入研究,以模拟来自不同社区的极分化观点,其依据的假设是,类似的特征(种族、社会背景、文化等)可以影响特定现象的告示者的观点。我们认为,通过依赖这些信息,我们可以将批注者分成具有类似观点的群体。我们可以为每个群体制定不同的黄金标准,以训练最先进的深层次学习模式。我们可以采用混合方法,将不同群体的观点-认知分级化者与包容性模式结合起来。我们还提出一个创新的、多层次、多层次的、跨层次的层次的统计学,以便从新的、多层次的、跨层次的英语的预测性分析。