Hatemoji:一个测试套件和反对立数据集,用于衡量和检测基于Emoji的仇恨 (Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-based Hate) - 专知论文

会员服务 ·

0

Performer · MoDELS · 数据集 · Better · 模型性能 ·

2021 年 8 月 12 日

Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-based Hate

翻译：Hatemoji:一个测试套件和反对立数据集,用于衡量和检测基于Emoji的仇恨

Hannah Rose Kirk,Bertram Vidgen,Paul Röttger,Scott A. Hale

Detecting online hate is a complex task, and low-performing detection models have harmful consequences when used for sensitive applications such as content moderation. Emoji-based hate is a key emerging challenge for online hate detection. We present HatemojiCheck, a test suite of 3,930 short-form statements that allows us to evaluate how detection models perform on hateful language expressed with emoji. Using the test suite, we expose weaknesses in existing hate detection models. To address these weaknesses, we create the HatemojiTrain dataset using an innovative human-and-model-in-the-loop approach. Models trained on these 5,912 adversarial examples perform substantially better at detecting emoji-based hate, while retaining strong performance on text-only hate. Both HatemojiCheck and HatemojiTrain are made publicly available.

翻译：检测网上仇恨是一项复杂的任务,而低效检测模型在用于诸如内容调适等敏感应用时会产生有害后果。基于Emoji的仇恨是在线仇恨检测方面新出现的一项关键挑战。我们展示了由3,930个短式声明组成的测试套件Hatemoji Check,它让我们能够评估检测模型如何运用与emoji表达的仇恨语言。我们使用测试套件,暴露了现有仇恨检测模型的弱点。为了解决这些弱点,我们使用创新的人类和模范流动方法创建了HatemojiTrain数据集。关于这5,912个对抗性仇恨的模型在发现基于情感的仇恨方面表现要好得多,同时保持只使用文本的仇恨方面的强绩。Hatemoji Check和HatemojiTrain都公开提供。

0

相关内容

Performer

不可错过！UIUC最新《对抗机器学习》课程，附PPT

专知会员服务

35+阅读 · 2020年12月28日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【论文】持续学习的图神经网络用于检测社交媒体的假新闻，Graph Neural Networks with Continual Learning for Fake News Detection from Social Media

【论文】持续学习的图神经网络用于检测社交媒体的假新闻，Graph Neural Networks with Continual Learning for Fake News Detection from Social Media

专知会员服务

41+阅读 · 2020年7月14日

【文献综述】Text Detection and Recognition in the Wild: A Review 自然文本检测与识别

【文献综述】Text Detection and Recognition in the Wild: A Review 自然文本检测与识别

专知会员服务

46+阅读 · 2020年6月11日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

专知会员服务

79+阅读 · 2020年2月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

自然语言处理顶会EMNLP2018接受论文列表！

自然语言处理顶会EMNLP2018接受论文列表！

专知

87+阅读 · 2018年8月26日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

TEET! Tunisian Dataset for Toxic Speech Detection

Arxiv

1+阅读 · 2021年10月11日

Contextual Lexicon-Based Approach for Hate Speech and Offensive Language Detection

Arxiv

0+阅读 · 2021年10月10日

BDC: Bounding-Box Deep Calibration for High Performance Face Detection

Arxiv

0+阅读 · 2021年10月8日

A Benchmarking Study of Embedding-based Entity Alignment for Knowledge Graphs

Arxiv

3+阅读 · 2020年7月20日

Deflecting Adversarial Attacks

Deflecting Adversarial Attacks

Arxiv

8+阅读 · 2020年2月18日

Adversarial NLI: A New Benchmark for Natural Language Understanding

Arxiv

4+阅读 · 2019年10月31日

OMNIA Faster R-CNN: Detection in the wild through dataset merging and soft distillation

OMNIA Faster R-CNN: Detection in the wild through dataset merging and soft distillation

Arxiv

6+阅读 · 2018年12月6日

One-Class Adversarial Nets for Fraud Detection

Arxiv

3+阅读 · 2018年6月5日

Deceiving End-to-End Deep Learning Malware Detectors using Adversarial Examples

Arxiv

4+阅读 · 2018年5月13日

Few-Example Object Detection with Model Communication

Arxiv

7+阅读 · 2018年2月14日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！UIUC最新《对抗机器学习》课程，附PPT

专知会员服务

35+阅读 · 2020年12月28日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【论文】持续学习的图神经网络用于检测社交媒体的假新闻，Graph Neural Networks with Continual Learning for Fake News Detection from Social Media

【论文】持续学习的图神经网络用于检测社交媒体的假新闻，Graph Neural Networks with Continual Learning for Fake News Detection from Social Media

专知会员服务

41+阅读 · 2020年7月14日

【文献综述】Text Detection and Recognition in the Wild: A Review 自然文本检测与识别

【文献综述】Text Detection and Recognition in the Wild: A Review 自然文本检测与识别

专知会员服务

46+阅读 · 2020年6月11日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

专知会员服务

79+阅读 · 2020年2月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

自然语言处理顶会EMNLP2018接受论文列表！

自然语言处理顶会EMNLP2018接受论文列表！

专知

87+阅读 · 2018年8月26日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

TEET! Tunisian Dataset for Toxic Speech Detection

Arxiv

1+阅读 · 2021年10月11日

Contextual Lexicon-Based Approach for Hate Speech and Offensive Language Detection

Arxiv

0+阅读 · 2021年10月10日

BDC: Bounding-Box Deep Calibration for High Performance Face Detection

Arxiv

0+阅读 · 2021年10月8日

A Benchmarking Study of Embedding-based Entity Alignment for Knowledge Graphs

Arxiv

3+阅读 · 2020年7月20日

Deflecting Adversarial Attacks

Deflecting Adversarial Attacks

Arxiv

8+阅读 · 2020年2月18日

Adversarial NLI: A New Benchmark for Natural Language Understanding

Arxiv

4+阅读 · 2019年10月31日

OMNIA Faster R-CNN: Detection in the wild through dataset merging and soft distillation

OMNIA Faster R-CNN: Detection in the wild through dataset merging and soft distillation

Arxiv

6+阅读 · 2018年12月6日

One-Class Adversarial Nets for Fraud Detection

Arxiv

3+阅读 · 2018年6月5日

Deceiving End-to-End Deep Learning Malware Detectors using Adversarial Examples

Arxiv

4+阅读 · 2018年5月13日

Few-Example Object Detection with Model Communication

Arxiv

7+阅读 · 2018年2月14日

微信扫码咨询专知VIP会员