Through anonymisation and accessibility, social media platforms have facilitated the proliferation of hate speech, prompting increased research in developing automatic methods to identify these texts. This paper explores the classification of sexism in text using a variety of deep neural network model architectures such as Long-Short-Term Memory (LSTMs) and Convolutional Neural Networks (CNNs). These networks are used in conjunction with transfer learning in the form of Bidirectional Encoder Representations from Transformers (BERT) and DistilBERT models, along with data augmentation, to perform binary and multiclass sexism classification on the dataset of tweets and gabs from the sEXism Identification in Social neTworks (EXIST) task in IberLEF 2021. The models are seen to perform comparatively to those from the competition, with the best performances seen using BERT and a multi-filter CNN model. Data augmentation further improves these results for the multi-class classification task. This paper also explores the errors made by the models and discusses the difficulty in automatically classifying sexism due to the subjectivity of the labels and the complexity of natural language used in social media.
翻译:通过匿名和无障碍,社交媒体平台促进了仇恨言论的传播,促使人们更多地研究开发自动方法以识别这些文本,本文件探讨了文本中的性别主义分类,使用了诸如长期短期内存(LSTMs)和进化神经网络(CNNs)等各种深层神经网络模型结构。这些网络与转移学习同时使用,其形式是来自变异器(BERT)和DistilBERT模型的双向编码演示,以及数据增强,以对来自IberLEF 2021中社会neTwork(EXIST)任务中的Twitter和gab数据集进行二元和多级性别分类。这些模型被认为与竞争中的模型相比,其最佳表现是使用BERT和多过滤CNN模型。数据增强进一步改进了多级分类任务中的这些结果。本文还探讨了模型的错误,并讨论了由于在社会媒体中使用的标语主题性和复杂性语言而自动分类的困难。