Hatemoji:一个测试套件和反对立数据集,用于衡量和检测基于Emoji的仇恨 (Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-based Hate) - 专知论文

会员服务 ·

0

Performer · Automator · MoDELS · 数据集 · Better ·

2021 年 8 月 31 日

Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-based Hate

翻译：Hatemoji:一个测试套件和反对立数据集,用于衡量和检测基于Emoji的仇恨

Hannah Rose Kirk,Bertram Vidgen,Paul Röttger,Tristan Thrush,Scott A. Hale

Detecting online hate is a complex task, and low-performing models have harmful consequences when used for sensitive applications such as content moderation. Emoji-based hate is a key emerging challenge for automated detection. We present HatemojiCheck, a test suite of 3,930 short-form statements that allows us to evaluate performance on hateful language expressed with emoji. Using the test suite, we expose weaknesses in existing hate detection models. To address these weaknesses, we create the HatemojiTrain dataset using a human-and-model-in-the-loop approach. Models trained on these 5,912 adversarial examples perform substantially better at detecting emoji-based hate, while retaining strong performance on text-only hate. Both HatemojiCheck and HatemojiTrain are made publicly available.

翻译：检测网上仇恨是一项复杂的任务,而低效模型在用于诸如内容调适等敏感应用时会产生有害后果。基于Emoji的仇恨是自动检测方面新出现的一项关键挑战。我们展示了由3,930个短式声明组成的测试套件Hatemoji Check, 这套测试套件让我们能够评估用emoji表达的仇恨语言的表现。我们使用测试套件暴露了现有仇恨检测模式的弱点。为了解决这些弱点,我们使用人和模范“在网中”的方法创建了HatemojiTrain数据集。以这些5,912个对抗性格范例为培训的模型在发现基于情感的仇恨方面表现要好得多,同时保持了对只使用文本的仇恨的有力表现。 Hatemoji Check和HatemojiTrain都公开提供。

0

相关内容

Performer

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

专知会员服务

51+阅读 · 2020年5月3日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【微软雷德蒙研究院】对抗机器学习工业视角，Adversarial Machine Learning

【微软雷德蒙研究院】对抗机器学习工业视角，Adversarial Machine Learning

专知会员服务

13+阅读 · 2020年2月24日

【微软雷德蒙研究院】对抗机器学习工业视角，Adversarial Machine Learning - Industry Perspectives

【微软雷德蒙研究院】对抗机器学习工业视角，Adversarial Machine Learning - Industry Perspectives

专知会员服务

12+阅读 · 2020年2月23日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

专知会员服务

43+阅读 · 2019年11月12日

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

专知会员服务

11+阅读 · 2019年11月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

异常检测（Anomaly Detection）综述

异常检测（Anomaly Detection）综述

极市平台

20+阅读 · 2020年10月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

区块链算法：零知识证明算法之zkSNARKs

区块链算法：零知识证明算法之zkSNARKs

待字闺中

9+阅读 · 2018年5月21日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

LibRec 每周算法：Wide & Deep (by Google)

LibRec 每周算法：Wide & Deep (by Google)

LibRec智能推荐

9+阅读 · 2017年10月25日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

CeyMo: See More on Roads -- A Novel Benchmark Dataset for Road Marking Detection

Arxiv

1+阅读 · 2021年10月22日

Digital and Physical-World Attacks on Remote Pulse Detection

Arxiv

0+阅读 · 2021年10月21日

SeaDronesSee: A Maritime Benchmark for Detecting Humans in Open Water

Arxiv

0+阅读 · 2021年10月19日

ECG-Adv-GAN: Detecting ECG Adversarial Examples with Conditional Generative Adversarial Networks

Arxiv

0+阅读 · 2021年10月19日

Ceasing hate withMoH: Hate Speech Detection in Hindi-English Code-Switched Language

Arxiv

0+阅读 · 2021年10月18日

On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark

Arxiv

0+阅读 · 2021年10月16日

Control Prefixes for Text Generation

Arxiv

0+阅读 · 2021年10月15日

General Instance Distillation for Object Detection

Arxiv

9+阅读 · 2021年3月3日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

Adversarial NLI: A New Benchmark for Natural Language Understanding

Arxiv

4+阅读 · 2019年10月31日

VIP会员

文章信息

相关主题

相关VIP内容

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

专知会员服务

51+阅读 · 2020年5月3日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【微软雷德蒙研究院】对抗机器学习工业视角，Adversarial Machine Learning

【微软雷德蒙研究院】对抗机器学习工业视角，Adversarial Machine Learning

专知会员服务

13+阅读 · 2020年2月24日

【微软雷德蒙研究院】对抗机器学习工业视角，Adversarial Machine Learning - Industry Perspectives

【微软雷德蒙研究院】对抗机器学习工业视角，Adversarial Machine Learning - Industry Perspectives

专知会员服务

12+阅读 · 2020年2月23日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

专知会员服务

43+阅读 · 2019年11月12日

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

专知会员服务

11+阅读 · 2019年11月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《俄乌战争中的无人系统：新的战争方式与新兴趋势——来自前线的印象》报告

《海上自主水面船舶远程操作中心：安全可持续运行的多维度分析》

多模态大语言模型下游调优中“保持自我”的重要性

隐身自主无人水下航行器技术如何变革水下作战并重塑海军竞争

相关资讯

异常检测（Anomaly Detection）综述

异常检测（Anomaly Detection）综述

极市平台

20+阅读 · 2020年10月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

区块链算法：零知识证明算法之zkSNARKs

区块链算法：零知识证明算法之zkSNARKs

待字闺中

9+阅读 · 2018年5月21日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

LibRec 每周算法：Wide & Deep (by Google)

LibRec 每周算法：Wide & Deep (by Google)

LibRec智能推荐

9+阅读 · 2017年10月25日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

CeyMo: See More on Roads -- A Novel Benchmark Dataset for Road Marking Detection

Arxiv

1+阅读 · 2021年10月22日

Digital and Physical-World Attacks on Remote Pulse Detection

Arxiv

0+阅读 · 2021年10月21日

SeaDronesSee: A Maritime Benchmark for Detecting Humans in Open Water

Arxiv

0+阅读 · 2021年10月19日

ECG-Adv-GAN: Detecting ECG Adversarial Examples with Conditional Generative Adversarial Networks

Arxiv

0+阅读 · 2021年10月19日

Ceasing hate withMoH: Hate Speech Detection in Hindi-English Code-Switched Language

Arxiv

0+阅读 · 2021年10月18日

On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark

Arxiv

0+阅读 · 2021年10月16日

Control Prefixes for Text Generation

Arxiv

0+阅读 · 2021年10月15日

General Instance Distillation for Object Detection

Arxiv

9+阅读 · 2021年3月3日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

Adversarial NLI: A New Benchmark for Natural Language Understanding

Arxiv

4+阅读 · 2019年10月31日

微信扫码咨询专知VIP会员