以情感、情感和目标探测等多任务学习,以识别仇恨言论和攻击性语言 (Multi-Task Learning with Sentiment, Emotion, and Target Detection to Recognize Hate Speech and Offensive Language)

from arxiv, publication at FIRE 2021 as system description paper in the HASOC-FIRE shared task on hate speech and offensive language detection. The original publication can be found at http://ceur-ws.org/Vol-3159/T1-30.pdf

The recognition of hate speech and offensive language (HOF) is commonly formulated as a classification task to decide if a text contains HOF. We investigate whether HOF detection can profit by taking into account the relationships between HOF and similar concepts: (a) HOF is related to sentiment analysis because hate speech is typically a negative statement and expresses a negative opinion; (b) it is related to emotion analysis, as expressed hate points to the author experiencing (or pretending to experience) anger while the addressees experience (or are intended to experience) fear. (c) Finally, one constituting element of HOF is the mention of a targeted person or group. On this basis, we hypothesize that HOF detection shows improvements when being modeled jointly with these concepts, in a multi-task learning setup. We base our experiments on existing data sets for each of these concepts (sentiment, emotion, target of HOF) and evaluate our models as a participant (as team IMS-SINAI) in the HASOC FIRE 2021 English Subtask 1A. Based on model-selection experiments in which we consider multiple available resources and submissions to the shared task, we find that the combination of the CrowdFlower emotion corpus, the SemEval 2016 Sentiment Corpus, and the OffensEval 2019 target detection data leads to an F1 =.79 in a multi-head multi-task learning model based on BERT, in comparison to .7895 of plain BERT. On the HASOC 2019 test data, this result is more substantial with an increase by 2pp in F1 and a considerable increase in recall. Across both data sets (2019, 2021), the recall is particularly increased for the class of HOF (6pp for the 2019 data and 3pp for the 2021 data), showing that MTL with emotion, sentiment, and target identification is an appropriate approach for early warning systems that might be deployed in social media platforms.

翻译：对仇恨言论和冒犯性语言(HOF)的认可通常被设计为一种分类任务,以决定文本是否包含HOF。我们调查HOF的检测是否通过考虑到HOF和类似概念之间的关系而受益。 (a) HOF与情绪分析有关,因为仇恨言论通常是负面的言论,并表达了一种负面意见;(b) 与情感分析有关,因为表达仇恨的指针表明作者正在经历(或假装体验)愤怒,而收件人的经历(或打算体验)是恐惧。 (c) 最后,HOF的一个构成要素是提及目标个人或团体。在此基础上,我们假设HOF的检测显示在与这些概念共同建模时,在多任务学习组合中,在2019 SINF的模型中,在2019 SINF的模型中,在2021 BMS-SINAIL的模型中,在2021 EVER的模型中,在2021 EVER的模型中,在将20-FOFF的模型中,在20VER的模型中,在20VER的模型中,在20-ROD的模型中,在20RO的模型中,在20F的模型中,我们发现,在20F的模型的模型中,在20FOL的模型的模型数据中,在20RID的模型中,在20ROUD的模型数据中,在20F的模型数据中,在比的模型中,在20OUD的模型中,在20的模型中,在20RD的模型中,在20RD的模型数据中,在20RD的模型中,在20RD的模型中,在比值数据中发现一个数据中,在20-VD的模型中,在20-VUD的模型中,在20-DUD的排序中,在20-D的模型中,在20-D的模型中,在20-D的模型中,在20-D的模型中,在20-VDUD的模型数据中,在20-F的模型中,在比数据中发现一个数据中,在20-VDUDUDUD的模型中,在20-VD的模型中,在20-VD的模型中,在比数据中,在20-D的模型