Online misogyny has become an increasing worry for Arab women who experience gender-based online abuse on a daily basis. Misogyny automatic detection systems can assist in the prohibition of anti-women Arabic toxic content. Developing such systems is hindered by the lack of the Arabic misogyny benchmark datasets. In this paper, we introduce an Arabic Levantine Twitter dataset for Misogynistic language (LeT-Mi) to be the first benchmark dataset for Arabic misogyny. We further provide a detailed review of the dataset creation and annotation phases. The consistency of the annotations for the proposed dataset was emphasized through inter-rater agreement evaluation measures. Moreover, Let-Mi was used as an evaluation dataset through binary/multi-/target classification tasks conducted by several state-of-the-art machine learning systems along with Multi-Task Learning (MTL) configuration. The obtained results indicated that the performances achieved by the used systems are consistent with state-of-the-art results for languages other than Arabic, while employing MTL improved the performance of the misogyny/target classification tasks.
翻译:网上厌恶症已成为每日遭受基于性别的在线虐待的阿拉伯妇女日益担忧的在线问题。Misogyny自动检测系统可以帮助禁止反对女性的阿拉伯有毒内容。由于缺乏阿拉伯厌恶症的基准数据集,开发这种系统受到阻碍。在本文中,我们引入了阿拉伯Levantine推特数据套,作为阿拉伯厌恶症的首个基准数据集。我们进一步详细回顾了数据集的创建和注释阶段。通过跨国家协议的评估措施,强调了拟议数据集说明的一致性。此外,Lee-Mi还被一些最先进的机器学习系统以及多塔斯克学习(MTL)配置用作评价数据集,同时使用MTL改进了误感/目标分类任务的业绩。