朝向众包中的比亚斯集团 (Toward Annotator Group Bias in Crowdsourcing) - 专知论文

会员服务 ·

0

有偏 · GROUP · MoDELS · 学成 · Machine Learning ·

2021 年 10 月 8 日

Toward Annotator Group Bias in Crowdsourcing

翻译：朝向众包中的比亚斯集团

Haochen Liu,Joseph Thekinen,Sinem Mollaoglu,Da Tang,Ji Yang,Youlong Cheng,Hui Liu,Jiliang Tang

from arxiv, 10 pages

Crowdsourcing has emerged as a popular approach for collecting annotated data to train supervised machine learning models. However, annotator bias can lead to defective annotations. Though there are a few works investigating individual annotator bias, the group effects in annotators are largely overlooked. In this work, we reveal that annotators within the same demographic group tend to show consistent group bias in annotation tasks and thus we conduct an initial study on annotator group bias. We first empirically verify the existence of annotator group bias in various real-world crowdsourcing datasets. Then, we develop a novel probabilistic graphical framework GroupAnno to capture annotator group bias with a new extended Expectation Maximization (EM) training algorithm. We conduct experiments on both synthetic and real-world datasets. Experimental results demonstrate the effectiveness of our model in modeling annotator group bias in label aggregation and model learning over competitive baselines.

翻译：在收集附加说明的数据以培训受监督的机器学习模式方面,众包已成为一种受欢迎的方法,用于收集附加说明的数据,但说明偏差可能导致说明有缺陷。虽然在调查个别说明的偏差方面做了一些工作,但批注者对群体的影响基本上被忽视。在这项工作中,我们发现同一人口组内的批注者往往在批注任务方面表现出一贯的团体偏见,因此我们首先对批注者群体偏差进行了初步研究。我们首先从经验上核实了各种真实世界的众包数据集中是否存在批注者群体偏差。然后,我们开发了一个新的概率性图形框架组Anno,以捕捉批注者群体的偏差,采用新的扩展预期最大化培训算法。我们同时对合成和现实世界数据集进行实验。实验结果表明,我们的模型在标注组在标签汇总和模型学习方面对竞争性基线的偏差方面是有效的。

0

相关内容

【MIT干货书】机器学习算法视角，126页pdf

【MIT干货书】机器学习算法视角，126页pdf

专知会员服务

78+阅读 · 2021年1月25日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

经典书《斯坦福大学-多智能体系统》532页pdf，MULTIAGENT SYSTEMS Algorithmic, Game-Theoretic, and Logical Foundations

经典书《斯坦福大学-多智能体系统》532页pdf，MULTIAGENT SYSTEMS Algorithmic, Game-Theoretic, and Logical Foundations

专知会员服务

158+阅读 · 2020年1月29日

【Manning干货书】机器学习系统可扩展规模设计（MLS），224页PDF

【Manning干货书】机器学习系统可扩展规模设计（MLS），224页PDF

专知会员服务

76+阅读 · 2020年1月21日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

已删除

将门创投

6+阅读 · 2019年9月3日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

计算机类 | ISCC 2019等国际会议信息9条

计算机类 | ISCC 2019等国际会议信息9条

Call4Papers

5+阅读 · 2018年12月25日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

Inferring Unobserved Events in Systems With Shared Resources and Queues

Arxiv

0+阅读 · 2021年12月9日

The Origin and Value of Disagreement Among Data Labelers: A Case Study of the Individual Difference in Hate Speech Annotation

Arxiv

0+阅读 · 2021年12月7日

Invitation in Crowdsourcing Contests

Arxiv

0+阅读 · 2021年12月6日

Transfer Learning from Synthetic to Real LiDAR Point Cloud for Semantic Segmentation

Arxiv

0+阅读 · 2021年12月2日

Efficient Data-specific Model Search for Collaborative Filtering

Efficient Data-specific Model Search for Collaborative Filtering

Arxiv

4+阅读 · 2021年6月14日

Data Poisoning Attacks and Defenses to Crowdsourcing Systems

Arxiv

8+阅读 · 2021年2月18日

Predicting ConceptNet Path Quality Using Crowdsourced Assessments of Naturalness

Arxiv

3+阅读 · 2019年2月21日

Low-Shot Learning from Imaginary Data

Arxiv

15+阅读 · 2018年4月3日

Complex Sequential Question Answering: Towards Learning to Converse Over Linked Question Answer Pairs with a Knowledge Graph

Arxiv

8+阅读 · 2018年1月31日

MilkQA: a Dataset of Consumer Questions for the Task of Answer Selection

Arxiv

4+阅读 · 2018年1月10日

VIP会员

文章信息

相关主题

Machine Learning

相关VIP内容

【MIT干货书】机器学习算法视角，126页pdf

【MIT干货书】机器学习算法视角，126页pdf

专知会员服务

78+阅读 · 2021年1月25日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

经典书《斯坦福大学-多智能体系统》532页pdf，MULTIAGENT SYSTEMS Algorithmic, Game-Theoretic, and Logical Foundations

经典书《斯坦福大学-多智能体系统》532页pdf，MULTIAGENT SYSTEMS Algorithmic, Game-Theoretic, and Logical Foundations

专知会员服务

158+阅读 · 2020年1月29日

【Manning干货书】机器学习系统可扩展规模设计（MLS），224页PDF

【Manning干货书】机器学习系统可扩展规模设计（MLS），224页PDF

专知会员服务

76+阅读 · 2020年1月21日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

已删除

将门创投

6+阅读 · 2019年9月3日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

计算机类 | ISCC 2019等国际会议信息9条

计算机类 | ISCC 2019等国际会议信息9条

Call4Papers

5+阅读 · 2018年12月25日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

相关论文

Inferring Unobserved Events in Systems With Shared Resources and Queues

Arxiv

0+阅读 · 2021年12月9日

The Origin and Value of Disagreement Among Data Labelers: A Case Study of the Individual Difference in Hate Speech Annotation

Arxiv

0+阅读 · 2021年12月7日

Invitation in Crowdsourcing Contests

Arxiv

0+阅读 · 2021年12月6日

Transfer Learning from Synthetic to Real LiDAR Point Cloud for Semantic Segmentation

Arxiv

0+阅读 · 2021年12月2日

Efficient Data-specific Model Search for Collaborative Filtering

Efficient Data-specific Model Search for Collaborative Filtering

Arxiv

4+阅读 · 2021年6月14日

Data Poisoning Attacks and Defenses to Crowdsourcing Systems

Arxiv

8+阅读 · 2021年2月18日

Predicting ConceptNet Path Quality Using Crowdsourced Assessments of Naturalness

Arxiv

3+阅读 · 2019年2月21日

Low-Shot Learning from Imaginary Data

Arxiv

15+阅读 · 2018年4月3日

Complex Sequential Question Answering: Towards Learning to Converse Over Linked Question Answer Pairs with a Knowledge Graph

Arxiv

8+阅读 · 2018年1月31日

MilkQA: a Dataset of Consumer Questions for the Task of Answer Selection

Arxiv

4+阅读 · 2018年1月10日

微信扫码咨询专知VIP会员