The detection of hate speech online has become an important task, as offensive language such as hurtful, obscene and insulting content can harm marginalized people or groups. This paper presents TU Berlin team experiments and results on the task 1A and 1B of the shared task on hate speech and offensive content identification in Indo-European languages 2021. The success of different Natural Language Processing models is evaluated for the respective subtasks throughout the competition. We tested different models based on recurrent neural networks in word and character levels and transfer learning approaches based on Bert on the provided dataset by the competition. Among the tested models that have been used for the experiments, the transfer learning-based models achieved the best results in both subtasks.
翻译:发现网上仇恨言论已成为一项重要任务,因为伤害性、淫秽和侮辱性内容等冒犯性语言可能会伤害边缘化的人或群体,本文件介绍了柏林工会团队在2021年印度-欧洲语言中关于仇恨言论和冒犯性内容识别的共同任务1A和1B的任务方面的实验和结果。在整个竞赛期间,对不同的自然语言处理模式的成功之处进行了相应子任务的评价。我们测试了基于语言和性格水平的经常性神经网络的不同模型,以及基于竞争提供的数据集的伯特的传授学习方法。在试验中所使用的测试模型中,基于学习的转移模型在两个子任务中都取得了最佳效果。