使用多语言变换模型自动检测性别主义 (Automatic Sexism Detection with Multilingual Transformer Models)

Schütz Mina,Boeck Jaqueline,Liakhovets Daria,Slijepčević Djordje,Kirchknopf Armin,Hecht Manuel,Bogensperger Johannes,Schlarb Sven,Schindler Alexander,Zeppelzauer Matthias

from arxiv, Technical Report to the AIT_FHSTP EXIST 2021 Challenge contribution (under review) http://nlp.uned.es/exist2021/

Sexism has become an increasingly major problem on social networks during the last years. The first shared task on sEXism Identification in Social neTworks (EXIST) at IberLEF 2021 is an international competition in the field of Natural Language Processing (NLP) with the aim to automatically identify sexism in social media content by applying machine learning methods. Thereby sexism detection is formulated as a coarse (binary) classification problem and a fine-grained classification task that distinguishes multiple types of sexist content (e.g., dominance, stereotyping, and objectification). This paper presents the contribution of the AIT_FHSTP team at the EXIST2021 benchmark for both tasks. To solve the tasks we applied two multilingual transformer models, one based on multilingual BERT and one based on XLM-R. Our approach uses two different strategies to adapt the transformers to the detection of sexist content: first, unsupervised pre-training with additional data and second, supervised fine-tuning with additional and augmented data. For both tasks our best model is XLM-R with unsupervised pre-training on the EXIST data and additional datasets and fine-tuning on the provided dataset. The best run for the binary classification (task 1) achieves a macro F1-score of 0.7752 and scores 5th rank in the benchmark; for the multiclass classification (task 2) our best submission scores 6th rank with a macro F1-score of 0.5589.

翻译：在过去几年里,性别主义在社会网络中已成为一个日益严重的问题。在IberLEF 2021 IberLEF 上,关于社会NeTworks(EXIST)的Sexis识别(EXIST)的第一个共同任务,是自然语言处理领域的国际竞争,目的是通过应用机器学习方法自动识别社交媒体内容中的性别主义。因此,性别主义检测是一个粗略(二进制)的分类问题和细微的分类任务,它区分了多种类型的性别歧视内容(如支配地位、定型和目标化)。本文介绍了在 EXIST2021 基准上,AIT_FHSTP团队对这两项任务的贡献。为了完成我们应用了两种多语言变异变模型的任务,一种基于多语言的BERT,一种基于XLM-R。我们的方法使用两种不同的战略使变异体适应对性别歧视内容的检测:第一,不超超前训练前训练,第二,监督对额外和强化的数据的调整。对于我们的最佳模型是XLM-R,在 EX 2021 最不精确地对TRI 数据进行最精确的分类和最精确的TRA1 和最精确的TRI 进行FISTRICR 的TRI 和最精确的FISTRIBCR 和最高级数据分类。