利用跨域传输大规模仇恨言论探测 (Large-Scale Hate Speech Detection with Cross-Domain Transfer) - 专知论文

会员服务 ·

0

Performer · MoDELS · Analysis · 数据集 · 训练实例 ·

2022 年 7 月 5 日

Large-Scale Hate Speech Detection with Cross-Domain Transfer

翻译：利用跨域传输大规模仇恨言论探测

Cagri Toraman,Furkan Şahinuç,Eyup Halit Yilmaz

from arxiv, Published at the Proceedings of the 13th Language Resources and Evaluation Conference (LREC 2022)

The performance of hate speech detection models relies on the datasets on which the models are trained. Existing datasets are mostly prepared with a limited number of instances or hate domains that define hate topics. This hinders large-scale analysis and transfer learning with respect to hate domains. In this study, we construct large-scale tweet datasets for hate speech detection in English and a low-resource language, Turkish, consisting of human-labeled 100k tweets per each. Our datasets are designed to have equal number of tweets distributed over five domains. The experimental results supported by statistical tests show that Transformer-based language models outperform conventional bag-of-words and neural models by at least 5% in English and 10% in Turkish for large-scale hate speech detection. The performance is also scalable to different training sizes, such that 98% of performance in English, and 97% in Turkish, are recovered when 20% of training instances are used. We further examine the generalization ability of cross-domain transfer among hate domains. We show that 96% of the performance of a target domain in average is recovered by other domains for English, and 92% for Turkish. Gender and religion are more successful to generalize to other domains, while sports fail most.

翻译：仇恨言论检测模型的性能取决于模型所培训的数据集。现有的数据集大多是用数量有限的事件或仇恨领域来制作的,用来定义仇恨主题。这阻碍了在仇恨领域进行大规模分析和转移学习。在本研究中,我们用英语和低资源语言,分别用人类标签为100k Twitter进行大规模推文检测,土耳其语由人类标签为每100k Twitter组成。我们的数据集的设计是,在五个领域分布同等数量的推文。通过统计测试支持的实验结果显示,基于变换器的语言模型在传统的言词包和神经模型上的表现至少超过5%的英语和10%的土耳其语,用于大规模仇恨言论检测。在使用20%的培训实例时,还可推广到不同的培训规模,即98%的英语表现和97%的土耳其语表现。我们进一步检查了在仇恨领域之间跨场转移的普及能力。我们发现,基于变换器的语言模型的平均性能有96%超过常规词包和神经模型,至少5%的英语和10%的土耳其语模型表现为5%,而在其他领域,92%的体育领域则更成功。

0

相关内容

Performer

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Serglycin调控TGF-β信号通路诱导EMT促进膀胱癌转移机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Akt-mTOR-Snail信号通路诱导肺动脉高压内皮细胞向间充质细胞转化的分子机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

考虑观测值时空相关性的InSAR三维形变估计方法

国家自然科学基金

0+阅读 · 2013年12月31日

基于分子影像可视化动物模型对TRAF4调控鼻咽癌放疗抵抗分子机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

化痰通脉饮对PCOS的IRS-1-PI3K/AKT/NF-κB串流失控的调节效应研究

国家自然科学基金

0+阅读 · 2013年12月31日

Snail/TELB/PAI-1通路在鼻咽癌转移中的作用及临床意义

国家自然科学基金

0+阅读 · 2013年12月31日

高灵敏、可重复性好的表面增强拉曼活性基底中尺寸小于10 纳米的热点结构的可控制备及其生物检测应用研究

国家自然科学基金

0+阅读 · 2013年12月31日

缺氧时HIF-1α转录激活自噬蛋白Beclin 1促进鼻咽癌转移机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

miR-96促进乳腺癌转移复发的生物学功能及分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

Legumain在乳腺癌骨转移和破骨损伤过程中的作用机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

On the Effectiveness of Transfer Learning for Code Search

Arxiv

0+阅读 · 2022年8月25日

Towards Large-Scale Small Object Detection: Survey and Benchmarks

Arxiv

40+阅读 · 2022年7月28日

Cross-Domain Few-Shot Graph Classification

Arxiv

13+阅读 · 2022年1月20日

Domain Generalization in Vision: A Survey

Arxiv

16+阅读 · 2021年7月18日

Generalizing to Unseen Domains: A Survey on Domain Generalization

Arxiv

30+阅读 · 2021年3月10日

Graph Neural Networks: Architectures, Stability and Transferability

Arxiv

13+阅读 · 2020年8月4日

Learning in the Frequency Domain

Learning in the Frequency Domain

Arxiv

11+阅读 · 2020年3月12日

Transferring Common-Sense Knowledge for Object Detection

Arxiv

12+阅读 · 2018年4月3日

Dynamic Zoom-in Network for Fast Object Detection in Large Images

Arxiv

20+阅读 · 2018年3月27日

DOTA: A Large-scale Dataset for Object Detection in Aerial Images

Arxiv

19+阅读 · 2018年1月27日

VIP会员

文章信息

相关主题

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津博士论文】零样本强化学习综述

《美军条令：陆军指挥官与规划人员地理空间指南》60页

战术边缘指挥控制：防务面临的核心挑战

迈向开放世界检测：综述

相关资讯

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

On the Effectiveness of Transfer Learning for Code Search

Arxiv

0+阅读 · 2022年8月25日

Towards Large-Scale Small Object Detection: Survey and Benchmarks

Arxiv

40+阅读 · 2022年7月28日

Cross-Domain Few-Shot Graph Classification

Arxiv

13+阅读 · 2022年1月20日

Domain Generalization in Vision: A Survey

Arxiv

16+阅读 · 2021年7月18日

Generalizing to Unseen Domains: A Survey on Domain Generalization

Arxiv

30+阅读 · 2021年3月10日

Graph Neural Networks: Architectures, Stability and Transferability

Arxiv

13+阅读 · 2020年8月4日

Learning in the Frequency Domain

Learning in the Frequency Domain

Arxiv

11+阅读 · 2020年3月12日

Transferring Common-Sense Knowledge for Object Detection

Arxiv

12+阅读 · 2018年4月3日

Dynamic Zoom-in Network for Fast Object Detection in Large Images

Arxiv

20+阅读 · 2018年3月27日

DOTA: A Large-scale Dataset for Object Detection in Aerial Images

Arxiv

19+阅读 · 2018年1月27日

相关基金

Serglycin调控TGF-β信号通路诱导EMT促进膀胱癌转移机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Akt-mTOR-Snail信号通路诱导肺动脉高压内皮细胞向间充质细胞转化的分子机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

考虑观测值时空相关性的InSAR三维形变估计方法

国家自然科学基金

0+阅读 · 2013年12月31日

基于分子影像可视化动物模型对TRAF4调控鼻咽癌放疗抵抗分子机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

化痰通脉饮对PCOS的IRS-1-PI3K/AKT/NF-κB串流失控的调节效应研究

国家自然科学基金

0+阅读 · 2013年12月31日

Snail/TELB/PAI-1通路在鼻咽癌转移中的作用及临床意义

国家自然科学基金

0+阅读 · 2013年12月31日

高灵敏、可重复性好的表面增强拉曼活性基底中尺寸小于10 纳米的热点结构的可控制备及其生物检测应用研究

国家自然科学基金

0+阅读 · 2013年12月31日

缺氧时HIF-1α转录激活自噬蛋白Beclin 1促进鼻咽癌转移机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

miR-96促进乳腺癌转移复发的生物学功能及分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

Legumain在乳腺癌骨转移和破骨损伤过程中的作用机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员