改进用多字表达式特点自动检测仇恨言论 (Improving Automatic Hate Speech Detection with Multiword Expression Features) - 专知论文

会员服务 ·

0

宏F1 · 基准 · Integration · Branch · 词向量表示 ·

2021 年 6 月 1 日

Improving Automatic Hate Speech Detection with Multiword Expression Features

翻译：改进用多字表达式特点自动检测仇恨言论

Nicolas Zampieri,Irina Illina,Dominique Fohr

from arxiv, In Proceedings of NLDB 2021

The task of automatically detecting hate speech in social media is gaining more and more attention. Given the enormous volume of content posted daily, human monitoring of hate speech is unfeasible. In this work, we propose new word-level features for automatic hate speech detection (HSD): multiword expressions (MWEs). MWEs are lexical units greater than a word that have idiomatic and compositional meanings. We propose to integrate MWE features in a deep neural network-based HSD framework. Our baseline HSD system relies on Universal Sentence Encoder (USE). To incorporate MWE features, we create a three-branch deep neural network: one branch for USE, one for MWE categories, and one for MWE embeddings. We conduct experiments on two hate speech tweet corpora with different MWE categories and with two types of MWE embeddings, word2vec and BERT. Our experiments demonstrate that the proposed HSD system with MWE features significantly outperforms the baseline system in terms of macro-F1.

翻译：自动发现社交媒体中的仇恨言论的任务日益受到越来越多的关注。鉴于每天张贴的内容数量庞大,人类对仇恨言论的监测是行不通的。在这项工作中,我们提出用于自动检测仇恨言论的新字级功能:多字表达式(MWEs)。MWE是大于具有独特性和构成含义的单词单位。我们提议将MWE特性纳入基于深层神经网络的基于HSD框架。我们的基准HSD系统依赖于通用判刑编码器(USE)。为了纳入MWE特征,我们建立了一个三权深神经网络:USE分支一个分支,MWE类别一个分支,MWE嵌入一个分支。我们实验了两种带有不同 MWE类别的仇恨言论推文体以及两种MWE嵌入式、W2vec和BERT。我们的实验表明,拟议的MWEHSD系统在宏观-F1方面大大超越了基线系统。

0

相关内容

宏F1

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【CVPR2020-Facebook】从检测到3D目标，FroDO: From Detections to 3D Objects

【CVPR2020-Facebook】从检测到3D目标，FroDO: From Detections to 3D Objects

专知会员服务

33+阅读 · 2020年5月12日

深度学习搜索，Exploring Deep Learning for Search

深度学习搜索，Exploring Deep Learning for Search

专知会员服务

61+阅读 · 2020年5月9日

【阿里巴巴-CVPR2020】频域学习，Learning in the Frequency Domain

【阿里巴巴-CVPR2020】频域学习，Learning in the Frequency Domain

专知会员服务

29+阅读 · 2020年3月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【AAAI2020接受论文】Emu:使用语义专门化增强多语言句子嵌入，Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization

【AAAI2020接受论文】Emu:使用语义专门化增强多语言句子嵌入，Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization

专知会员服务

26+阅读 · 2019年11月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

已删除

将门创投

6+阅读 · 2019年1月11日

FastText的内部机制

FastText的内部机制

黑龙江大学自然语言处理实验室

5+阅读 · 2018年7月25日

笔记 | Sentiment Analysis

笔记 | Sentiment Analysis

黑龙江大学自然语言处理实验室

10+阅读 · 2018年5月6日

Soft-NMS – Improving Object Detection With One Line of Code

Soft-NMS – Improving Object Detection With One Line of Code

统计学习与视觉计算组

6+阅读 · 2018年3月30日

Efficient conformer-based speech recognition with linear attention

Arxiv

0+阅读 · 2021年7月23日

Automatic Speech Recognition in Sanskrit: A New Speech Corpus and Modelling Insights

Arxiv

0+阅读 · 2021年7月23日

Speech Driven Talking Face Generation from a Single Image and an Emotion Condition

Arxiv

0+阅读 · 2021年7月21日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

Language Modeling with Deep Transformers

Arxiv

6+阅读 · 2019年7月11日

Cross-Modal Self-Attention Network for Referring Image Segmentation

Cross-Modal Self-Attention Network for Referring Image Segmentation

Arxiv

18+阅读 · 2019年4月9日

Few-shot Object Detection via Feature Reweighting

Arxiv

7+阅读 · 2018年12月5日

Domain Specific Approximation for Object Detection

Arxiv

5+阅读 · 2018年10月4日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

Learning Rich Features for Image Manipulation Detection

Arxiv

9+阅读 · 2018年5月13日

VIP会员

文章信息

相关主题

词向量表示

相关VIP内容

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【CVPR2020-Facebook】从检测到3D目标，FroDO: From Detections to 3D Objects

【CVPR2020-Facebook】从检测到3D目标，FroDO: From Detections to 3D Objects

专知会员服务

33+阅读 · 2020年5月12日

深度学习搜索，Exploring Deep Learning for Search

深度学习搜索，Exploring Deep Learning for Search

专知会员服务

61+阅读 · 2020年5月9日

【阿里巴巴-CVPR2020】频域学习，Learning in the Frequency Domain

【阿里巴巴-CVPR2020】频域学习，Learning in the Frequency Domain

专知会员服务

29+阅读 · 2020年3月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【AAAI2020接受论文】Emu:使用语义专门化增强多语言句子嵌入，Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization

【AAAI2020接受论文】Emu:使用语义专门化增强多语言句子嵌入，Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization

专知会员服务

26+阅读 · 2019年11月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《多域作战兵棋推演：运用形态学分析与人工智能加强国防人员训练》

《采用智能弹药的仿生无人机蜂群实施目标压制》

仿生机器人技术的军事应用

《反集群作战：基于深度学习的分布式决策方法》89页

相关资讯

已删除

将门创投

6+阅读 · 2019年1月11日

FastText的内部机制

FastText的内部机制

黑龙江大学自然语言处理实验室

5+阅读 · 2018年7月25日

笔记 | Sentiment Analysis

笔记 | Sentiment Analysis

黑龙江大学自然语言处理实验室

10+阅读 · 2018年5月6日

Soft-NMS – Improving Object Detection With One Line of Code

Soft-NMS – Improving Object Detection With One Line of Code

统计学习与视觉计算组

6+阅读 · 2018年3月30日

相关论文

Efficient conformer-based speech recognition with linear attention

Arxiv

0+阅读 · 2021年7月23日

Automatic Speech Recognition in Sanskrit: A New Speech Corpus and Modelling Insights

Arxiv

0+阅读 · 2021年7月23日

Speech Driven Talking Face Generation from a Single Image and an Emotion Condition

Arxiv

0+阅读 · 2021年7月21日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

Language Modeling with Deep Transformers

Arxiv

6+阅读 · 2019年7月11日

Cross-Modal Self-Attention Network for Referring Image Segmentation

Cross-Modal Self-Attention Network for Referring Image Segmentation

Arxiv

18+阅读 · 2019年4月9日

Few-shot Object Detection via Feature Reweighting

Arxiv

7+阅读 · 2018年12月5日

Domain Specific Approximation for Object Detection

Arxiv

5+阅读 · 2018年10月4日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

Learning Rich Features for Image Manipulation Detection

Arxiv

9+阅读 · 2018年5月13日

微信扫码咨询专知VIP会员