This paper describes our participation in SemEval-2023 Task 10, whose goal is the detection of sexism in social media. We explore some of the most popular transformer models such as BERT, DistilBERT, RoBERTa, and XLNet. We also study different data augmentation techniques to increase the training dataset. During the development phase, our best results were obtained by using RoBERTa and data augmentation for tasks B and C. However, the use of synthetic data does not improve the results for task C. We participated in the three subtasks. Our approach still has much room for improvement, especially in the two fine-grained classifications. All our code is available in the repository https://github.com/isegura/hulat_edos.
翻译:本文介绍了我们参加SemEval-2023任务10的情况,其目标是在社交媒体中发现性别主义。我们探讨了一些最受欢迎的变压器模型,如BERT、DistillBERT、ROBERTA和XLNet。我们还研究了不同的数据增强技术,以增加培训数据集。在开发阶段,我们通过使用RoBERTA和任务B和C的数据增强,取得了最佳成果。然而,使用合成数据并不能改善任务C的结果。我们参加了三项子任务。我们的方法仍有很大的改进余地,特别是在两种精细分类方面。我们的所有代码都可以在https://github.com/isegura/hulat_edos的存储库中查阅。</s>