The detection of offensive, hateful and profane language has become a critical challenge since many users in social networks are exposed to cyberbullying activities on a daily basis. In this paper, we present an analysis of combining different textual features for the detection of hateful or offensive posts on Twitter. We provide a detailed experimental evaluation to understand the impact of each building block in a neural network architecture. The proposed architecture is evaluated on the English Subtask 1A: Identifying Hate, offensive and profane content from the post datasets of HASOC-2021 dataset under the team name TIB-VA. We compared different variants of the contextual word embeddings combined with the character level embeddings and the encoding of collected hate terms.
翻译:发现攻击性、仇恨性和亵渎性语言已成为一项重大挑战,因为社交网络的许多用户每天都会受到网络欺凌活动的影响。在本文件中,我们分析了在Twitter上发现仇恨性或攻击性文章的不同文本特征。我们提供了详细的实验性评估,以了解神经网络架构中每个构件的影响。在《英国子任务1A:识别HasOC-2021数据集后集的仇恨、攻击性和隐含性内容》中,在TIB-VA组名下,对拟议架构进行了评估。我们比较了背景词嵌入的不同变量,加上性格嵌入和收集的仇恨术语编码。