Let-Mi: 阿拉伯语Levantine Twitter数据集,用于Misogynistic 语言 (Let-Mi: An Arabic Levantine Twitter Dataset for Misogynistic Language) - 专知论文

会员服务 ·

0

Performer · 数据集 · state-of-the-art · 多任务学习 · Twitter ·

2021 年 3 月 18 日

Let-Mi: An Arabic Levantine Twitter Dataset for Misogynistic Language

翻译：Let-Mi: 阿拉伯语Levantine Twitter数据集,用于Misogynistic 语言

Hala Mulki,Bilal Ghanem

from arxiv, 10 pages, 2 figures, WANLP 2021 co-located with EACL 2021

Online misogyny has become an increasing worry for Arab women who experience gender-based online abuse on a daily basis. Misogyny automatic detection systems can assist in the prohibition of anti-women Arabic toxic content. Developing such systems is hindered by the lack of the Arabic misogyny benchmark datasets. In this paper, we introduce an Arabic Levantine Twitter dataset for Misogynistic language (LeT-Mi) to be the first benchmark dataset for Arabic misogyny. We further provide a detailed review of the dataset creation and annotation phases. The consistency of the annotations for the proposed dataset was emphasized through inter-rater agreement evaluation measures. Moreover, Let-Mi was used as an evaluation dataset through binary/multi-/target classification tasks conducted by several state-of-the-art machine learning systems along with Multi-Task Learning (MTL) configuration. The obtained results indicated that the performances achieved by the used systems are consistent with state-of-the-art results for languages other than Arabic, while employing MTL improved the performance of the misogyny/target classification tasks.

翻译：网上厌恶症已成为每日遭受基于性别的在线虐待的阿拉伯妇女日益担忧的在线问题。Misogyny自动检测系统可以帮助禁止反对女性的阿拉伯有毒内容。由于缺乏阿拉伯厌恶症的基准数据集,开发这种系统受到阻碍。在本文中,我们引入了阿拉伯Levantine推特数据套,作为阿拉伯厌恶症的首个基准数据集。我们进一步详细回顾了数据集的创建和注释阶段。通过跨国家协议的评估措施,强调了拟议数据集说明的一致性。此外,Lee-Mi还被一些最先进的机器学习系统以及多塔斯克学习(MTL)配置用作评价数据集,同时使用MTL改进了误感/目标分类任务的业绩。

0

相关内容

Performer

最新【深度生成模型】Deep Generative Models，104页ppt

最新【深度生成模型】Deep Generative Models，104页ppt

专知会员服务

71+阅读 · 2020年10月24日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

少标签数据学习，54页ppt

少标签数据学习，54页ppt

专知会员服务

205+阅读 · 2020年5月22日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

想在PyTorch里训练BERT，请试试Facebook跨语言模型XLM

想在PyTorch里训练BERT，请试试Facebook跨语言模型XLM

量子位

3+阅读 · 2019年6月23日

深度学习自然语言处理阅读清单

深度学习自然语言处理阅读清单

专知

23+阅读 · 2019年1月13日

计算机类 | ISCC 2019等国际会议信息9条

计算机类 | ISCC 2019等国际会议信息9条

Call4Papers

5+阅读 · 2018年12月25日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

自然语言处理顶会EMNLP2018接受论文列表！

自然语言处理顶会EMNLP2018接受论文列表！

专知

87+阅读 · 2018年8月26日

COLING 2018-最新论文最全分类-整理分享

COLING 2018-最新论文最全分类-整理分享

深度学习与NLP

6+阅读 · 2018年7月6日

【推荐】Kaggle机器学习数据集推荐

【推荐】Kaggle机器学习数据集推荐

机器学习研究会

8+阅读 · 2017年11月19日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

Kleister: Key Information Extraction Datasets Involving Long Documents with Complex Layouts

Kleister: Key Information Extraction Datasets Involving Long Documents with Complex Layouts

Arxiv

3+阅读 · 2021年5月12日

Are Anti-Feminist Communities Gateways to the Far Right? Evidence from Reddit and YouTube

Arxiv

0+阅读 · 2021年5月12日

Backretrieval: An Image-Pivoted Evaluation Metric for Cross-Lingual Text Representations Without Parallel Corpora

Arxiv

0+阅读 · 2021年5月11日

An end-to-end Optical Character Recognition approach for ultra-low-resolution printed text images

Arxiv

0+阅读 · 2021年5月10日

Similarities between Arabic Dialects: Investigating Geographical Proximity

Arxiv

0+阅读 · 2021年5月10日

AraCOVID19-MFH: Arabic COVID-19 Multi-label Fake News and Hate Speech Detection Dataset

Arxiv

0+阅读 · 2021年5月7日

Sentiment and Emotion Classification of Epidemic Related Bilingual data from Social Media

Arxiv

0+阅读 · 2021年5月4日

Differentially Private Histograms in the Shuffle Model from Fake Users

Arxiv

0+阅读 · 2021年5月3日

Sentiment Analysis of Arabic Tweets: Feature Engineering and A Hybrid Approach

Arxiv

7+阅读 · 2018年5月22日

Improving Sentiment Analysis in Arabic Using Word Representation

Arxiv

4+阅读 · 2018年2月28日

VIP会员

文章信息

相关主题

state-of-the-art

多任务学习

相关VIP内容

最新【深度生成模型】Deep Generative Models，104页ppt

最新【深度生成模型】Deep Generative Models，104页ppt

专知会员服务

71+阅读 · 2020年10月24日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

少标签数据学习，54页ppt

少标签数据学习，54页ppt

专知会员服务

205+阅读 · 2020年5月22日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《俄乌战争背景下俄罗斯的战略性海军分析（2022-2025年）》最新100页报告

【斯坦福博士论文】数据、决策与依赖：构建可信人工智能的挑战

人工智能时代背景下的未来海战

接触战中的无人机优势：美军旅级部队面临的小型无人机系统挑战与调整

相关资讯

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

想在PyTorch里训练BERT，请试试Facebook跨语言模型XLM

想在PyTorch里训练BERT，请试试Facebook跨语言模型XLM

量子位

3+阅读 · 2019年6月23日

深度学习自然语言处理阅读清单

深度学习自然语言处理阅读清单

专知

23+阅读 · 2019年1月13日

计算机类 | ISCC 2019等国际会议信息9条

计算机类 | ISCC 2019等国际会议信息9条

Call4Papers

5+阅读 · 2018年12月25日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

自然语言处理顶会EMNLP2018接受论文列表！

自然语言处理顶会EMNLP2018接受论文列表！

专知

87+阅读 · 2018年8月26日

COLING 2018-最新论文最全分类-整理分享

COLING 2018-最新论文最全分类-整理分享

深度学习与NLP

6+阅读 · 2018年7月6日

【推荐】Kaggle机器学习数据集推荐

【推荐】Kaggle机器学习数据集推荐

机器学习研究会

8+阅读 · 2017年11月19日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

相关论文

Kleister: Key Information Extraction Datasets Involving Long Documents with Complex Layouts

Kleister: Key Information Extraction Datasets Involving Long Documents with Complex Layouts

Arxiv

3+阅读 · 2021年5月12日

Are Anti-Feminist Communities Gateways to the Far Right? Evidence from Reddit and YouTube

Arxiv

0+阅读 · 2021年5月12日

Backretrieval: An Image-Pivoted Evaluation Metric for Cross-Lingual Text Representations Without Parallel Corpora

Arxiv

0+阅读 · 2021年5月11日

An end-to-end Optical Character Recognition approach for ultra-low-resolution printed text images

Arxiv

0+阅读 · 2021年5月10日

Similarities between Arabic Dialects: Investigating Geographical Proximity

Arxiv

0+阅读 · 2021年5月10日

AraCOVID19-MFH: Arabic COVID-19 Multi-label Fake News and Hate Speech Detection Dataset

Arxiv

0+阅读 · 2021年5月7日

Sentiment and Emotion Classification of Epidemic Related Bilingual data from Social Media

Arxiv

0+阅读 · 2021年5月4日

Differentially Private Histograms in the Shuffle Model from Fake Users

Arxiv

0+阅读 · 2021年5月3日

Sentiment Analysis of Arabic Tweets: Feature Engineering and A Hybrid Approach

Arxiv

7+阅读 · 2018年5月22日

Improving Sentiment Analysis in Arabic Using Word Representation

Arxiv

4+阅读 · 2018年2月28日

微信扫码咨询专知VIP会员