K-MHAS:韩国在线新闻评论中的多标签仇恨言论检测数据集 (K-MHaS: A Multi-label Hate Speech Detection Dataset in Korean Online News Comment) - 专知论文

会员服务 ·

0

在线 · 数据集 · 语言模型化 · 分解 · 基准 ·

2022 年 9 月 30 日

K-MHaS: A Multi-label Hate Speech Detection Dataset in Korean Online News Comment

翻译：K-MHAS:韩国在线新闻评论中的多标签仇恨言论检测数据集

Jean Lee,Taejun Lim,Heejun Lee,Bogeun Jo,Yangsok Kim,Heegeun Yoon,Soyeon Caren Han

from arxiv, Accepted by COLING 2022

Online hate speech detection has become an important issue due to the growth of online content, but resources in languages other than English are extremely limited. We introduce K-MHaS, a new multi-label dataset for hate speech detection that effectively handles Korean language patterns. The dataset consists of 109k utterances from news comments and provides a multi-label classification using 1 to 4 labels, and handles subjectivity and intersectionality. We evaluate strong baseline experiments on K-MHaS using Korean-BERT-based language models with six different metrics. KR-BERT with a sub-character tokenizer outperforms others, recognizing decomposed characters in each hate speech class.

翻译：由于在线内容的增长,在线仇恨言论检测已成为一个重要问题,但英语以外语言的资源极为有限。我们引入了K-MHAS,这是一个用于检测仇恨言论的新的多标签数据集,可有效处理韩国语言模式。该数据集包含109k条来自新闻评论的语句,提供使用1至4个标签的多标签分类,并处理主观性和交叉性。我们用基于韩国-BERT的语言模型用六种不同的度量来评估K-MHAS的强力基线实验。 KR-BERT, 配有子字符符号符号比其他人更优, 承认每个仇恨言论类中腐烂的人物。

0

相关内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

自组装超薄二维ZnO纳米片阵列的制备及光电性能研究

国家自然科学基金

0+阅读 · 2015年12月31日

拟南芥去泛素化酶UBP15/SOD2通过影响组蛋白H2B泛素化水平调控种子和器官大小的分子机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

Beclin 1在阿尔茨海默病样神经元损伤中的调控机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

水分子在固液界面的热输运的理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于率相关的混凝土动态断裂韧度试验及其声发射特性研究

国家自然科学基金

0+阅读 · 2013年12月31日

环糊精酯化交联天然生物质材料强化吸附水中典型HOCs的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

使用光子晶体操控半导体量子点自发辐射的研究

国家自然科学基金

0+阅读 · 2012年12月31日

流体动力学若干模型的定性研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

Improved Target-specific Stance Detection on Social Media Platforms by Delving into Conversation Threads

Arxiv

0+阅读 · 2022年11月6日

Multilingual Name Entity Recognition and Intent Classification Employing Deep Learning Architectures

Arxiv

0+阅读 · 2022年11月4日

Multi-Faceted Distillation of Base-Novel Commonality for Few-shot Object Detection

Arxiv

0+阅读 · 2022年11月4日

Hybrid-SD ($\text{H}_{\text{SD}}$) : A new hybrid evaluation metric for automatic speech recognition tasks

Arxiv

0+阅读 · 2022年11月3日

Leveraging Domain Features for Detecting Adversarial Attacks Against Deep Speech Recognition in Noise

Arxiv

0+阅读 · 2022年11月3日

Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Arxiv

28+阅读 · 2021年6月16日

A Comprehensive Survey on Community Detection with Deep Learning

Arxiv

14+阅读 · 2021年5月26日

Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution

Arxiv

10+阅读 · 2021年1月24日

Few-shot Learning for Multi-label Intent Detection

Arxiv

21+阅读 · 2020年10月11日

Scene Text Detection and Recognition: The Deep Learning Era

Scene Text Detection and Recognition: The Deep Learning Era

Arxiv

27+阅读 · 2019年9月5日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

数据要素发展报告(2025年)：附下载

人工智能代理提升战时舰船战备水平

【NeurIPS2025教程】大语言模型规划

NeurIPS 2025 教程：深度学习训练不稳定性的理论洞见

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

Improved Target-specific Stance Detection on Social Media Platforms by Delving into Conversation Threads

Arxiv

0+阅读 · 2022年11月6日

Multilingual Name Entity Recognition and Intent Classification Employing Deep Learning Architectures

Arxiv

0+阅读 · 2022年11月4日

Multi-Faceted Distillation of Base-Novel Commonality for Few-shot Object Detection

Arxiv

0+阅读 · 2022年11月4日

Hybrid-SD ($\text{H}_{\text{SD}}$) : A new hybrid evaluation metric for automatic speech recognition tasks

Arxiv

0+阅读 · 2022年11月3日

Leveraging Domain Features for Detecting Adversarial Attacks Against Deep Speech Recognition in Noise

Arxiv

0+阅读 · 2022年11月3日

Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Arxiv

28+阅读 · 2021年6月16日

A Comprehensive Survey on Community Detection with Deep Learning

Arxiv

14+阅读 · 2021年5月26日

Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution

Arxiv

10+阅读 · 2021年1月24日

Few-shot Learning for Multi-label Intent Detection

Arxiv

21+阅读 · 2020年10月11日

Scene Text Detection and Recognition: The Deep Learning Era

Scene Text Detection and Recognition: The Deep Learning Era

Arxiv

27+阅读 · 2019年9月5日

相关基金

自组装超薄二维ZnO纳米片阵列的制备及光电性能研究

国家自然科学基金

0+阅读 · 2015年12月31日

拟南芥去泛素化酶UBP15/SOD2通过影响组蛋白H2B泛素化水平调控种子和器官大小的分子机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

Beclin 1在阿尔茨海默病样神经元损伤中的调控机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

水分子在固液界面的热输运的理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于率相关的混凝土动态断裂韧度试验及其声发射特性研究

国家自然科学基金

0+阅读 · 2013年12月31日

环糊精酯化交联天然生物质材料强化吸附水中典型HOCs的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

使用光子晶体操控半导体量子点自发辐射的研究

国家自然科学基金

0+阅读 · 2012年12月31日

流体动力学若干模型的定性研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员