HS-BAN:孟加拉国境内仇恨言论检测社会媒体评论基准数据集 (HS-BAN: A Benchmark Dataset of Social Media Comments for Hate Speech Detection in Bangla) - 专知论文

会员服务 ·

0

INFORMS · 数据集 · 可约的 · 词向量表示 · Performer ·

2021 年 12 月 3 日

HS-BAN: A Benchmark Dataset of Social Media Comments for Hate Speech Detection in Bangla

翻译：HS-BAN:孟加拉国境内仇恨言论检测社会媒体评论基准数据集

Nauros Romim,Mosahed Ahmed,Md Saiful Islam,Arnab Sen Sharma,Hriteshwar Talukder,Mohammad Ruhul Amin

from arxiv, Submitted to ICON 21 (Rejected)

In this paper, we present HS-BAN, a binary class hate speech (HS) dataset in Bangla language consisting of more than 50,000 labeled comments, including 40.17% hate and rest are non hate speech. While preparing the dataset a strict and detailed annotation guideline was followed to reduce human annotation bias. The HS dataset was also preprocessed linguistically to extract different types of slang currently people write using symbols, acronyms, or alternative spellings. These slang words were further categorized into traditional and non-traditional slang lists and included in the results of this paper. We explored traditional linguistic features and neural network-based methods to develop a benchmark system for hate speech detection for the Bangla language. Our experimental results show that existing word embedding models trained with informal texts perform better than those trained with formal text. Our benchmark shows that a Bi-LSTM model on top of the FastText informal word embedding achieved 86.78% F1-score. We will make the dataset available for public use.

翻译：在本文中,我们以孟加拉语展示了HS-BAN, 一种二元级仇恨言论(HS-BAN)数据集,由50 000多条贴标签的评论组成,包括40.17%的仇恨和休息是非仇恨言论。在准备数据集时,遵循了严格和详细的批注准则,以减少人类笔记偏见。HS数据集还预先在语言上进行了处理,以提取目前人们使用符号、缩略语或替代拼写方式书写的不同类型的 sang 。这些词被进一步归类为传统和非传统的标签清单,并列入本文件的结果。我们探讨了传统语言特征和神经网络型网络型方法,以制定孟加拉语仇恨言论检测基准系统。我们的实验结果显示,经过非正式文本培训的现有词嵌入模式比经过正式文本培训的要好。我们的基准显示,在快速图文非正式词嵌入上的一个Bi-LSTM模型已经达到86.78% F1核心。我们将将数据集提供给公众使用。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【论文】持续学习的图神经网络用于检测社交媒体的假新闻，Graph Neural Networks with Continual Learning for Fake News Detection from Social Media

【论文】持续学习的图神经网络用于检测社交媒体的假新闻，Graph Neural Networks with Continual Learning for Fake News Detection from Social Media

专知会员服务

41+阅读 · 2020年7月14日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

65+阅读 · 2020年5月12日

【北卡罗莱纳州立大学】单场景视频异常检测综述，A Survey of Single-Scene Video Anomaly Detection

【北卡罗莱纳州立大学】单场景视频异常检测综述，A Survey of Single-Scene Video Anomaly Detection

专知会员服务

31+阅读 · 2020年4月13日

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

专知会员服务

79+阅读 · 2020年2月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

Yoshua Bengio，使算法知道“为什么”

Yoshua Bengio，使算法知道“为什么”

专知会员服务

8+阅读 · 2019年10月10日

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新八篇情感分析相关论文—注意力网络、多模态情感分析、情感分析局限性、跨语言情感分类、多语言情感分析

【论文推荐】最新八篇情感分析相关论文—注意力网络、多模态情感分析、情感分析局限性、跨语言情感分类、多语言情感分析

专知

52+阅读 · 2018年6月28日

【论文推荐】最新六篇知识图谱相关论文—事件演化图、神经词义消歧、增强神经网络、Mem2Seq、用户偏好传播、概率嵌入

【论文推荐】最新六篇知识图谱相关论文—事件演化图、神经词义消歧、增强神经网络、Mem2Seq、用户偏好传播、概率嵌入

专知

19+阅读 · 2018年6月14日

推荐｜深度强化学习聊天机器人（附论文）！

推荐｜深度强化学习聊天机器人（附论文）！

全球人工智能

4+阅读 · 2018年1月30日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

Benchmarking Deep Models for Salient Object Detection

Arxiv

0+阅读 · 2022年2月7日

Multilingual Hate Speech and Offensive Content Detection using Modified Cross-entropy Loss

Arxiv

0+阅读 · 2022年2月5日

A simple language-agnostic yet very strong baseline system for hate speech and offensive content identification

Arxiv

0+阅读 · 2022年2月5日

Controlled Analyses of Social Biases in Wikipedia Bios

Arxiv

0+阅读 · 2022年2月5日

A Benchmark Corpus for the Detection of Automatically Generated Text in Academic Publications

Arxiv

0+阅读 · 2022年2月4日

Cross-Platform Difference in Facebook and Text Messages Language Use: Illustrated by Depression Diagnosis

Arxiv

0+阅读 · 2022年2月3日

The RoyalFlush System of Speech Recognition for M2MeT Challenge

Arxiv

0+阅读 · 2022年2月3日

Mining Dual Emotion for Fake News Detection

Arxiv

13+阅读 · 2020年10月19日

Deep Learning for Deepfakes Creation and Detection

Deep Learning for Deepfakes Creation and Detection

Arxiv

6+阅读 · 2019年9月25日

Stylistic Variation in Social Media Part-of-Speech Tagging

Arxiv

4+阅读 · 2018年4月19日

VIP会员

文章信息

相关主题

词向量表示

相关VIP内容

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【论文】持续学习的图神经网络用于检测社交媒体的假新闻，Graph Neural Networks with Continual Learning for Fake News Detection from Social Media

【论文】持续学习的图神经网络用于检测社交媒体的假新闻，Graph Neural Networks with Continual Learning for Fake News Detection from Social Media

专知会员服务

41+阅读 · 2020年7月14日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

65+阅读 · 2020年5月12日

【北卡罗莱纳州立大学】单场景视频异常检测综述，A Survey of Single-Scene Video Anomaly Detection

【北卡罗莱纳州立大学】单场景视频异常检测综述，A Survey of Single-Scene Video Anomaly Detection

专知会员服务

31+阅读 · 2020年4月13日

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

专知会员服务

79+阅读 · 2020年2月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

Yoshua Bengio，使算法知道“为什么”

Yoshua Bengio，使算法知道“为什么”

专知会员服务

8+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

小规模训练指南：打造世界级大语言模型的关键方法

无人机编队飞行：复杂环境中作战的策略、挑战与应用

大模型APP，AI时代第一个爆款

从数据中心视角出发的高效大语言模型训练综述

相关资讯

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新八篇情感分析相关论文—注意力网络、多模态情感分析、情感分析局限性、跨语言情感分类、多语言情感分析

【论文推荐】最新八篇情感分析相关论文—注意力网络、多模态情感分析、情感分析局限性、跨语言情感分类、多语言情感分析

专知

52+阅读 · 2018年6月28日

【论文推荐】最新六篇知识图谱相关论文—事件演化图、神经词义消歧、增强神经网络、Mem2Seq、用户偏好传播、概率嵌入

【论文推荐】最新六篇知识图谱相关论文—事件演化图、神经词义消歧、增强神经网络、Mem2Seq、用户偏好传播、概率嵌入

专知

19+阅读 · 2018年6月14日

推荐｜深度强化学习聊天机器人（附论文）！

推荐｜深度强化学习聊天机器人（附论文）！

全球人工智能

4+阅读 · 2018年1月30日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

相关论文

Benchmarking Deep Models for Salient Object Detection

Arxiv

0+阅读 · 2022年2月7日

Multilingual Hate Speech and Offensive Content Detection using Modified Cross-entropy Loss

Arxiv

0+阅读 · 2022年2月5日

A simple language-agnostic yet very strong baseline system for hate speech and offensive content identification

Arxiv

0+阅读 · 2022年2月5日

Controlled Analyses of Social Biases in Wikipedia Bios

Arxiv

0+阅读 · 2022年2月5日

A Benchmark Corpus for the Detection of Automatically Generated Text in Academic Publications

Arxiv

0+阅读 · 2022年2月4日

Cross-Platform Difference in Facebook and Text Messages Language Use: Illustrated by Depression Diagnosis

Arxiv

0+阅读 · 2022年2月3日

The RoyalFlush System of Speech Recognition for M2MeT Challenge

Arxiv

0+阅读 · 2022年2月3日

Mining Dual Emotion for Fake News Detection

Arxiv

13+阅读 · 2020年10月19日

Deep Learning for Deepfakes Creation and Detection

Deep Learning for Deepfakes Creation and Detection

Arxiv

6+阅读 · 2019年9月25日

Stylistic Variation in Social Media Part-of-Speech Tagging

Arxiv

4+阅读 · 2018年4月19日

微信扫码咨询专知VIP会员