利用静态BERT嵌入器检测仇恨言论 (Hate speech detection using static BERT embeddings) - 专知论文

会员服务 ·

0

FT · Performer · BERT · Extensibility · 可约的 ·

2021 年 6 月 29 日

Hate speech detection using static BERT embeddings

翻译：利用静态BERT嵌入器检测仇恨言论

Gaurav Rajput,Narinder Singh punn,Sanjay Kumar Sonbhadra,Sonali Agarwal

With increasing popularity of social media platforms hate speech is emerging as a major concern, where it expresses abusive speech that targets specific group characteristics, such as gender, religion or ethnicity to spread violence. Earlier people use to verbally deliver hate speeches but now with the expansion of technology, some people are deliberately using social media platforms to spread hate by posting, sharing, commenting, etc. Whether it is Christchurch mosque shootings or hate crimes against Asians in west, it has been observed that the convicts are very much influenced from hate text present online. Even though AI systems are in place to flag such text but one of the key challenges is to reduce the false positive rate (marking non hate as hate), so that these systems can detect hate speech without undermining the freedom of expression. In this paper, we use ETHOS hate speech detection dataset and analyze the performance of hate speech detection classifier by replacing or integrating the word embeddings (fastText (FT), GloVe (GV) or FT + GV) with static BERT embeddings (BE). With the extensive experimental trails it is observed that the neural network performed better with static BE compared to using FT, GV or FT + GV as word embeddings. In comparison to fine-tuned BERT, one metric that significantly improved is specificity.

翻译：随着社会媒体平台日益受欢迎,仇恨言论正在成为一个主要关切,因为社会媒体平台正在日益流行,仇恨言论正在成为一个主要关注问题,它表达了针对性别、宗教或族裔等特定群体特征的虐待性言论,以传播暴力。早期人们使用口头发表仇恨言论,但随着技术的扩展,一些人正在蓄意使用社交媒体平台,通过张贴、分享、评论等传播仇恨。无论是基督教教堂清真寺的枪击事件,还是针对西方亚洲人的仇恨犯罪,人们发现罪犯受到网上仇恨文本的影响很大。尽管已经建立了大赦国际系统,以标出此类文本,但主要挑战之一是降低假正率(将非仇恨标记为仇恨),以便这些系统能够检测仇恨言论,而不损害言论自由。在本文中,我们使用ETHOS仇恨言论检测数据集,分析仇恨言论分类的性能,替换或整合嵌入词(fastText(FT)、GloVe(GV)或FT+GV),与静态的BERT(BEE)嵌入(BE)系统相比,与静态的GFFS(BF)系统相比,与静态化的GFT(GFT)系统相比,改进了网络。

0

相关内容

【干货书】Pytorch自然语言处理，210页pdf

【干货书】Pytorch自然语言处理，210页pdf

专知会员服务

166+阅读 · 2020年10月30日

自然语言处理顶会EMNLP2020接受论文列表，754篇论文都在这儿了！

自然语言处理顶会EMNLP2020接受论文列表，754篇论文都在这儿了！

专知会员服务

28+阅读 · 2020年10月26日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【厦门大学-CVPR2020】协调可迁移性与可判别性的自适应目标检测器，Adapting Object Detectors

【厦门大学-CVPR2020】协调可迁移性与可判别性的自适应目标检测器，Adapting Object Detectors

专知会员服务

26+阅读 · 2020年3月16日

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

专知会员服务

79+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【课程推荐】普林斯顿陈丹琦COS 484: 自然语言处理课程

【课程推荐】普林斯顿陈丹琦COS 484: 自然语言处理课程

专知会员服务

85+阅读 · 2019年12月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Awesome-Chinese-NLP：中文自然语言处理相关资料

Awesome-Chinese-NLP：中文自然语言处理相关资料

AINLP

30+阅读 · 2019年2月17日

polyglot：Pipeline 多语言NLP工具

polyglot：Pipeline 多语言NLP工具

AINLP

4+阅读 · 2018年12月11日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

已删除

将门创投

3+阅读 · 2017年11月3日

Multi-Task Learning in Utterance-Level and Segmental-Level Spoof Detection

Arxiv

0+阅读 · 2021年8月31日

Span Fine-tuning for Pre-trained Language Models

Arxiv

0+阅读 · 2021年8月29日

URLTran: Improving Phishing URL Detection Using Transformers

Arxiv

0+阅读 · 2021年8月27日

Towards Open World Object Detection

Arxiv

13+阅读 · 2021年3月3日

A Decade Survey of Content Based Image Retrieval using Deep Learning

Arxiv

23+阅读 · 2020年11月23日

Exploring Categorical Regularization for Domain Adaptive Object Detection

Exploring Categorical Regularization for Domain Adaptive Object Detection

Arxiv

5+阅读 · 2020年3月20日

Imbalance Problems in Object Detection: A Review

Arxiv

24+阅读 · 2020年3月11日

Semantics-aware BERT for Language Understanding

Arxiv

4+阅读 · 2019年9月5日

Domain Specific Approximation for Object Detection

Arxiv

5+阅读 · 2018年10月4日

Zero-Shot Object Detection

Zero-Shot Object Detection

Arxiv

9+阅读 · 2018年7月27日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】Pytorch自然语言处理，210页pdf

【干货书】Pytorch自然语言处理，210页pdf

专知会员服务

166+阅读 · 2020年10月30日

自然语言处理顶会EMNLP2020接受论文列表，754篇论文都在这儿了！

自然语言处理顶会EMNLP2020接受论文列表，754篇论文都在这儿了！

专知会员服务

28+阅读 · 2020年10月26日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【厦门大学-CVPR2020】协调可迁移性与可判别性的自适应目标检测器，Adapting Object Detectors

【厦门大学-CVPR2020】协调可迁移性与可判别性的自适应目标检测器，Adapting Object Detectors

专知会员服务

26+阅读 · 2020年3月16日

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

专知会员服务

79+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【课程推荐】普林斯顿陈丹琦COS 484: 自然语言处理课程

【课程推荐】普林斯顿陈丹琦COS 484: 自然语言处理课程

专知会员服务

85+阅读 · 2019年12月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Awesome-Chinese-NLP：中文自然语言处理相关资料

Awesome-Chinese-NLP：中文自然语言处理相关资料

AINLP

30+阅读 · 2019年2月17日

polyglot：Pipeline 多语言NLP工具

polyglot：Pipeline 多语言NLP工具

AINLP

4+阅读 · 2018年12月11日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

已删除

将门创投

3+阅读 · 2017年11月3日

相关论文

Multi-Task Learning in Utterance-Level and Segmental-Level Spoof Detection

Arxiv

0+阅读 · 2021年8月31日

Span Fine-tuning for Pre-trained Language Models

Arxiv

0+阅读 · 2021年8月29日

URLTran: Improving Phishing URL Detection Using Transformers

Arxiv

0+阅读 · 2021年8月27日

Towards Open World Object Detection

Arxiv

13+阅读 · 2021年3月3日

A Decade Survey of Content Based Image Retrieval using Deep Learning

Arxiv

23+阅读 · 2020年11月23日

Exploring Categorical Regularization for Domain Adaptive Object Detection

Exploring Categorical Regularization for Domain Adaptive Object Detection

Arxiv

5+阅读 · 2020年3月20日

Imbalance Problems in Object Detection: A Review

Arxiv

24+阅读 · 2020年3月11日

Semantics-aware BERT for Language Understanding

Arxiv

4+阅读 · 2019年9月5日

Domain Specific Approximation for Object Detection

Arxiv

5+阅读 · 2018年10月4日

Zero-Shot Object Detection

Zero-Shot Object Detection

Arxiv

9+阅读 · 2018年7月27日

微信扫码咨询专知VIP会员