《数据标签标签者之间差异的起源和价值:仇恨言论注释中个人差异的个案研究》 (The Origin and Value of Disagreement Among Data Labelers: A Case Study of the Individual Difference in Hate Speech Annotation) - 专知论文

会员服务 ·

0

标注 · 缩放 · 数据标签 · CASE · Processing（编程语言） ·

2021 年 12 月 7 日

The Origin and Value of Disagreement Among Data Labelers: A Case Study of the Individual Difference in Hate Speech Annotation

翻译：《数据标签标签者之间差异的起源和价值:仇恨言论注释中个人差异的个案研究》

Yisi Sang,Jeffrey Stanton

Human annotated data is the cornerstone of today's artificial intelligence efforts, yet data labeling processes can be complicated and expensive, especially when human labelers disagree with each other. The current work practice is to use majority-voted labels to overrule the disagreement. However, in the subjective data labeling tasks such as hate speech annotation, disagreement among individual labelers can be difficult to resolve. In this paper, we explored why such disagreements occur using a mixed-method approach - including interviews with experts, concept mapping exercises, and self-reporting items - to develop a multidimensional scale for distilling the process of how annotators label a hate speech corpus. We tested this scale with 170 annotators in a hate speech annotation task. Results showed that our scale can reveal facets of individual differences among annotators (e.g., age, personality, etc.), and these facets' relationships to an annotator's final label decision of an instance. We suggest that this work contributes to the understanding of how humans annotate data. The proposed scale can potentially improve the value of the currently discarded minority-vote labels.

翻译：人类附加说明的数据是当今人工智能工作的基石,但数据标签过程可能复杂而昂贵,特别是当人类标签者彼此意见不一致时。目前的工作实践是使用多数人投票的标签推翻分歧。然而,在主观数据标签任务中,如仇恨言论注释,个别标签者之间的分歧可能难以解决。在本文件中,我们探讨了为什么使用混合方法,包括专家访谈、概念绘图练习和自我报告项目等方法出现这种分歧,以形成一个多层面的规模,用于蒸馏说明说明说明说明说明说明煽动仇恨言论者如何贴标签的过程。我们用170个注解员在仇恨言论注解任务中测试了这一规模。结果显示,我们的规模可以揭示说明说明说明者(例如年龄、个性等)之间个人差异的方方面面,以及这些方面与一个标注者的最后标签决定之间的关系。我们建议,这项工作有助于了解人类如何作注的数据。拟议的规模有可能提高目前被丢弃的少数群体投票标签的价值。

0

相关内容

视频隐私保护技术综述

视频隐私保护技术综述

专知会员服务

35+阅读 · 2022年1月19日

【ACL2021】Weight Distillation：神经网络权重知识迁移方法

专知会员服务

21+阅读 · 2021年8月17日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

深度神经网络模型的个体差异，Individual differences among deep neural network models

深度神经网络模型的个体差异，Individual differences among deep neural network models

专知会员服务

10+阅读 · 2020年1月11日

【ECML-PKDD 2019】基于种子样本的Web数据抽取（Web Data Extraction with Seed Samples）

【ECML-PKDD 2019】基于种子样本的Web数据抽取（Web Data Extraction with Seed Samples）

专知会员服务

8+阅读 · 2019年12月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

已删除

将门创投

5+阅读 · 2019年8月19日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

【论文推荐】最新五篇度量学习相关论文—无标签、三维姿态估计、主动度量学习、深度度量学习、层次度量学习与匹配

【论文推荐】最新五篇度量学习相关论文—无标签、三维姿态估计、主动度量学习、深度度量学习、层次度量学习与匹配

专知

20+阅读 · 2018年4月5日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【计算机类】期刊专刊/国际会议截稿信息6条

【计算机类】期刊专刊/国际会议截稿信息6条

Call4Papers

3+阅读 · 2017年10月13日

最佳实践：深度学习用于自然语言处理（三）

最佳实践：深度学习用于自然语言处理（三）

待字闺中

3+阅读 · 2017年8月20日

The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge

Arxiv

0+阅读 · 2022年2月10日

Analyzing Medical Data with Process Mining: a COVID-19 Case Study

Arxiv

0+阅读 · 2022年2月8日

Regularizing End-to-End Speech Translation with Triangular Decomposition Agreement

Arxiv

7+阅读 · 2021年12月21日

Unbalanced minibatch Optimal Transport; applications to Domain Adaptation

Arxiv

3+阅读 · 2021年3月5日

The Importance of Modeling Data Missingness in Algorithmic Fairness: A Causal Perspective

Arxiv

5+阅读 · 2020年12月21日

Learning Diverse and Discriminative Representations via the Principle of Maximal Coding Rate Reduction

Arxiv

9+阅读 · 2020年6月15日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

Multi-Head Attention with Disagreement Regularization

Arxiv

9+阅读 · 2018年10月24日

Can LSTM Learn to Capture Agreement? The Case of Basque

Arxiv

3+阅读 · 2018年9月11日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

视频隐私保护技术综述

视频隐私保护技术综述

专知会员服务

35+阅读 · 2022年1月19日

【ACL2021】Weight Distillation：神经网络权重知识迁移方法

专知会员服务

21+阅读 · 2021年8月17日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

深度神经网络模型的个体差异，Individual differences among deep neural network models

深度神经网络模型的个体差异，Individual differences among deep neural network models

专知会员服务

10+阅读 · 2020年1月11日

【ECML-PKDD 2019】基于种子样本的Web数据抽取（Web Data Extraction with Seed Samples）

【ECML-PKDD 2019】基于种子样本的Web数据抽取（Web Data Extraction with Seed Samples）

专知会员服务

8+阅读 · 2019年12月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【ICCV2025教程】基础模型遇见具身智能体

军事机器学习设计：关于开发自动化任务摘要系统的梯次化设计科学研究 | 2025最新93页

扩散模型中的缓存方法综述：迈向高效的多模态生成

【ICCV2025教程】《迈向视觉语言模型的全面推理》

相关资讯

已删除

将门创投

5+阅读 · 2019年8月19日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

【论文推荐】最新五篇度量学习相关论文—无标签、三维姿态估计、主动度量学习、深度度量学习、层次度量学习与匹配

【论文推荐】最新五篇度量学习相关论文—无标签、三维姿态估计、主动度量学习、深度度量学习、层次度量学习与匹配

专知

20+阅读 · 2018年4月5日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【计算机类】期刊专刊/国际会议截稿信息6条

【计算机类】期刊专刊/国际会议截稿信息6条

Call4Papers

3+阅读 · 2017年10月13日

最佳实践：深度学习用于自然语言处理（三）

最佳实践：深度学习用于自然语言处理（三）

待字闺中

3+阅读 · 2017年8月20日

相关论文

The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge

Arxiv

0+阅读 · 2022年2月10日

Analyzing Medical Data with Process Mining: a COVID-19 Case Study

Arxiv

0+阅读 · 2022年2月8日

Regularizing End-to-End Speech Translation with Triangular Decomposition Agreement

Arxiv

7+阅读 · 2021年12月21日

Unbalanced minibatch Optimal Transport; applications to Domain Adaptation

Arxiv

3+阅读 · 2021年3月5日

The Importance of Modeling Data Missingness in Algorithmic Fairness: A Causal Perspective

Arxiv

5+阅读 · 2020年12月21日

Learning Diverse and Discriminative Representations via the Principle of Maximal Coding Rate Reduction

Arxiv

9+阅读 · 2020年6月15日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

Multi-Head Attention with Disagreement Regularization

Arxiv

9+阅读 · 2018年10月24日

Can LSTM Learn to Capture Agreement? The Case of Basque

Arxiv

3+阅读 · 2018年9月11日

微信扫码咨询专知VIP会员