TTEET! 突尼斯用于检测毒性言语的数据集 (TEET! Tunisian Dataset for Toxic Speech Detection) - 专知论文

会员服务 ·

0

XLM-R · 数据集 · Performer · 学成 · 词表 ·

2021 年 10 月 11 日

TEET! Tunisian Dataset for Toxic Speech Detection

翻译：TTEET! 突尼斯用于检测毒性言语的数据集

Slim Gharbi,Heger Arfaoui,Hatem Haddad,Mayssa Kchaou

The complete freedom of expression in social media has its costs especially in spreading harmful and abusive content that may induce people to act accordingly. Therefore, the need of detecting automatically such a content becomes an urgent task that will help and enhance the efficiency in limiting this toxic spread. Compared to other Arabic dialects which are mostly based on MSA, the Tunisian dialect is a combination of many other languages like MSA, Tamazight, Italian and French. Because of its rich language, dealing with NLP problems can be challenging due to the lack of large annotated datasets. In this paper we are introducing a new annotated dataset composed of approximately 10k of comments. We provide an in-depth exploration of its vocabulary through feature engineering approaches as well as the results of the classification performance of machine learning classifiers like NB and SVM and deep learning models such as ARBERT, MARBERT and XLM-R.

翻译：社交媒体完全的言论自由成本高昂,特别是传播有害和滥用内容,可能促使人们采取相应行动。因此,自动发现这类内容成为一项紧迫任务,有助于提高限制这种有毒传播的效率。与其他主要基于特派任务生活津贴的阿拉伯语方言相比,突尼斯方言是许多其他语言的结合,如MIS、Tamazight、意大利语和法语。由于语言丰富,处理NLP问题可能具有挑战性,因为缺乏大量附加说明的数据集。本文我们引入了一套新的附加说明的数据集,由大约10公里的评论组成。我们通过地物工程方法以及诸如NB和SVM等机器学习分类师的分类工作成果以及ABERT、MARBERT和XLM-R等深层次学习模式,深入探索其词汇。

1

相关内容

XLM-R

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

【文献综述】Text Detection and Recognition in the Wild: A Review 自然文本检测与识别

【文献综述】Text Detection and Recognition in the Wild: A Review 自然文本检测与识别

专知会员服务

46+阅读 · 2020年6月11日

【论文推荐】深度学习中的异常实例检测:综述，Anomalous Instance Detection in Deep Learning: A Survey

【论文推荐】深度学习中的异常实例检测:综述，Anomalous Instance Detection in Deep Learning: A Survey

专知会员服务

97+阅读 · 2020年3月17日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

专知会员服务

79+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

计算机类 | PLDI 2020等国际会议信息6条

计算机类 | PLDI 2020等国际会议信息6条

Call4Papers

3+阅读 · 2019年7月8日

计算机 | 国际会议信息5条

计算机 | 国际会议信息5条

Call4Papers

3+阅读 · 2019年7月3日

计算机 | 中低难度国际会议信息8条

计算机 | 中低难度国际会议信息8条

Call4Papers

9+阅读 · 2019年6月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

计算机 | CCF推荐期刊专刊信息5条

计算机 | CCF推荐期刊专刊信息5条

Call4Papers

3+阅读 · 2019年4月10日

计算机类 | ISCC 2019等国际会议信息9条

计算机类 | ISCC 2019等国际会议信息9条

Call4Papers

5+阅读 · 2018年12月25日

计算机类 | LICS 2019等国际会议信息7条

计算机类 | LICS 2019等国际会议信息7条

Call4Papers

3+阅读 · 2018年12月17日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

【计算机类】期刊专刊/国际会议截稿信息6条

【计算机类】期刊专刊/国际会议截稿信息6条

Call4Papers

3+阅读 · 2017年10月13日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

HS-BAN: A Benchmark Dataset of Social Media Comments for Hate Speech Detection in Bangla

Arxiv

0+阅读 · 2021年12月3日

Deep Learning for UAV-based Object Detection and Tracking: A Survey

Arxiv

64+阅读 · 2021年10月25日

Few-shot Scene-adaptive Anomaly Detection

Few-shot Scene-adaptive Anomaly Detection

Arxiv

8+阅读 · 2020年7月15日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

Clustered Object Detection in Aerial Images

Clustered Object Detection in Aerial Images

Arxiv

5+阅读 · 2019年8月27日

Multi-Task Self-Supervised Learning for Disfluency Detection

Arxiv

5+阅读 · 2019年8月15日

Object Detection in 20 Years: A Survey

Object Detection in 20 Years: A Survey

Arxiv

48+阅读 · 2019年5月13日

Zero-Shot Object Detection

Zero-Shot Object Detection

Arxiv

9+阅读 · 2018年7月27日

Few-Example Object Detection with Model Communication

Arxiv

7+阅读 · 2018年2月14日

DOTA: A Large-scale Dataset for Object Detection in Aerial Images

Arxiv

19+阅读 · 2018年1月27日

VIP会员

文章信息

相关主题

相关VIP内容

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

【文献综述】Text Detection and Recognition in the Wild: A Review 自然文本检测与识别

【文献综述】Text Detection and Recognition in the Wild: A Review 自然文本检测与识别

专知会员服务

46+阅读 · 2020年6月11日

【论文推荐】深度学习中的异常实例检测:综述，Anomalous Instance Detection in Deep Learning: A Survey

【论文推荐】深度学习中的异常实例检测:综述，Anomalous Instance Detection in Deep Learning: A Survey

专知会员服务

97+阅读 · 2020年3月17日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

专知会员服务

79+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

小规模训练指南：打造世界级大语言模型的关键方法

无人机编队飞行：复杂环境中作战的策略、挑战与应用

大模型APP，AI时代第一个爆款

从数据中心视角出发的高效大语言模型训练综述

相关资讯

计算机类 | PLDI 2020等国际会议信息6条

计算机类 | PLDI 2020等国际会议信息6条

Call4Papers

3+阅读 · 2019年7月8日

计算机 | 国际会议信息5条

计算机 | 国际会议信息5条

Call4Papers

3+阅读 · 2019年7月3日

计算机 | 中低难度国际会议信息8条

计算机 | 中低难度国际会议信息8条

Call4Papers

9+阅读 · 2019年6月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

计算机 | CCF推荐期刊专刊信息5条

计算机 | CCF推荐期刊专刊信息5条

Call4Papers

3+阅读 · 2019年4月10日

计算机类 | ISCC 2019等国际会议信息9条

计算机类 | ISCC 2019等国际会议信息9条

Call4Papers

5+阅读 · 2018年12月25日

计算机类 | LICS 2019等国际会议信息7条

计算机类 | LICS 2019等国际会议信息7条

Call4Papers

3+阅读 · 2018年12月17日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

【计算机类】期刊专刊/国际会议截稿信息6条

【计算机类】期刊专刊/国际会议截稿信息6条

Call4Papers

3+阅读 · 2017年10月13日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

相关论文

HS-BAN: A Benchmark Dataset of Social Media Comments for Hate Speech Detection in Bangla

Arxiv

0+阅读 · 2021年12月3日

Deep Learning for UAV-based Object Detection and Tracking: A Survey

Arxiv

64+阅读 · 2021年10月25日

Few-shot Scene-adaptive Anomaly Detection

Few-shot Scene-adaptive Anomaly Detection

Arxiv

8+阅读 · 2020年7月15日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

Clustered Object Detection in Aerial Images

Clustered Object Detection in Aerial Images

Arxiv

5+阅读 · 2019年8月27日

Multi-Task Self-Supervised Learning for Disfluency Detection

Arxiv

5+阅读 · 2019年8月15日

Object Detection in 20 Years: A Survey

Object Detection in 20 Years: A Survey

Arxiv

48+阅读 · 2019年5月13日

Zero-Shot Object Detection

Zero-Shot Object Detection

Arxiv

9+阅读 · 2018年7月27日

Few-Example Object Detection with Model Communication

Arxiv

7+阅读 · 2018年2月14日

DOTA: A Large-scale Dataset for Object Detection in Aerial Images

Arxiv

19+阅读 · 2018年1月27日

微信扫码咨询专知VIP会员