相信众人:贝叶斯推论改善众包的数据标签 (Confident in the Crowd: Bayesian Inference to Improve Data Labelling in Crowdsourcing) - 专知论文

会员服务 ·

0

贝叶斯推断 · 标注 · 数据标签 · 置信度 · 推断 ·

2021 年 5 月 28 日

Confident in the Crowd: Bayesian Inference to Improve Data Labelling in Crowdsourcing

翻译：相信众人:贝叶斯推论改善众包的数据标签

Pierce Burke,Richard Klein

from arxiv, 6 pages, 4 figures

With the increased interest in machine learning and big data problems, the need for large amounts of labelled data has also grown. However, it is often infeasible to get experts to label all of this data, which leads many practitioners to crowdsourcing solutions. In this paper, we present new techniques to improve the quality of the labels while attempting to reduce the cost. The naive approach to assigning labels is to adopt a majority vote method, however, in the context of data labelling, this is not always ideal as data labellers are not equally reliable. One might, instead, give higher priority to certain labellers through some kind of weighted vote based on past performance. This paper investigates the use of more sophisticated methods, such as Bayesian inference, to measure the performance of the labellers as well as the confidence of each label. The methods we propose follow an iterative improvement algorithm which attempts to use the least amount of workers necessary to achieve the desired confidence in the inferred label. This paper explores simulated binary classification problems with simulated workers and questions to test the proposed methods. Our methods outperform the standard voting methods in both cost and accuracy while maintaining higher reliability when there is disagreement within the crowd.

翻译：随着对机器学习和大数据问题的日益关注,对大量贴标签数据的需求也增加了。然而,往往不宜让专家给所有这些数据贴上标签,这导致许多从业者寻找众包解决方案。在本文中,我们介绍了提高标签质量的新技术,同时试图降低成本。在数据标签方面,分配标签的天真的方法是采用多数投票方法,然而,在数据标签方面,这并不总是理想的,因为数据标签员并不同样可靠。人们可能会通过基于过去表现的某种加权投票,给予某些标签员更高的优先地位。本文调查了如何使用更先进的方法,如Bayesian推论,以衡量标签员的性能以及每个标签的可信度。我们建议的方法是采用迭代式改进算法,试图使用最低数量的工人来获得对推断标签的信任。本文探讨了模拟工人的二进制分类问题和测试拟议方法的问题。我们的方法在成本和准确度上都超过了标准投票方法,同时在人群内部出现分歧时保持更高的可靠性。

0

相关内容

贝叶斯推断

贝叶斯推断

贝叶斯推断（BAYESIAN INFERENCE）是一种应用于不确定性条件下的决策的统计方法。贝叶斯推断的显著特征是，为了得到一个统计结论能够利用先验信息和样本信息。

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

【经典书】机器学习黑客秘笈(Machine Learning for Hackers)，322页pdf

专知会员服务

46+阅读 · 2021年2月8日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文推荐】最新6篇图像分割相关论文—隐马尔可夫随机场、级联三维全卷积、信号处理、全卷积网络、多源域适应、循环分割

【论文推荐】最新6篇图像分割相关论文—隐马尔可夫随机场、级联三维全卷积、信号处理、全卷积网络、多源域适应、循环分割

专知

9+阅读 · 2018年3月21日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Evaluating methods for Lasso selective inference in biomedical research by a comparative simulation study

Evaluating methods for Lasso selective inference in biomedical research by a comparative simulation study

Arxiv

0+阅读 · 2021年7月20日

Independent finite approximations for Bayesian nonparametric inference

Arxiv

0+阅读 · 2021年7月19日

Likelihood-Free Frequentist Inference: Bridging Classical Statistics and Machine Learning in Simulation and Uncertainty Quantification

Arxiv

0+阅读 · 2021年7月19日

Grab the Reins of Crowds: Estimating the Effects of Crowd Movement Guidance Using Causal Inference

Arxiv

0+阅读 · 2021年7月19日

Bayesian Crowdsourcing with Constraints

Arxiv

0+阅读 · 2021年7月16日

Data vs classifiers, who wins?

Arxiv

0+阅读 · 2021年7月16日

Data Poisoning Attacks and Defenses to Crowdsourcing Systems

Arxiv

8+阅读 · 2021年2月18日

Semi-Supervised Learning with Variational Bayesian Inference and Maximum Uncertainty Regularization

Arxiv

4+阅读 · 2020年12月3日

Exploiting Synthetically Generated Data with Semi-Supervised Learning for Small and Imbalanced Datasets

Arxiv

3+阅读 · 2019年3月24日

Classification with Fairness Constraints: A Meta-Algorithm with Provable Guarantees

Classification with Fairness Constraints: A Meta-Algorithm with Provable Guarantees

Arxiv

3+阅读 · 2018年8月2日

VIP会员

文章信息

相关主题

贝叶斯推断

相关VIP内容

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

【经典书】机器学习黑客秘笈(Machine Learning for Hackers)，322页pdf

专知会员服务

46+阅读 · 2021年2月8日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

小规模训练指南：打造世界级大语言模型的关键方法

无人机编队飞行：复杂环境中作战的策略、挑战与应用

大模型APP，AI时代第一个爆款

从数据中心视角出发的高效大语言模型训练综述

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文推荐】最新6篇图像分割相关论文—隐马尔可夫随机场、级联三维全卷积、信号处理、全卷积网络、多源域适应、循环分割

【论文推荐】最新6篇图像分割相关论文—隐马尔可夫随机场、级联三维全卷积、信号处理、全卷积网络、多源域适应、循环分割

专知

9+阅读 · 2018年3月21日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Evaluating methods for Lasso selective inference in biomedical research by a comparative simulation study

Evaluating methods for Lasso selective inference in biomedical research by a comparative simulation study

Arxiv

0+阅读 · 2021年7月20日

Independent finite approximations for Bayesian nonparametric inference

Arxiv

0+阅读 · 2021年7月19日

Likelihood-Free Frequentist Inference: Bridging Classical Statistics and Machine Learning in Simulation and Uncertainty Quantification

Arxiv

0+阅读 · 2021年7月19日

Grab the Reins of Crowds: Estimating the Effects of Crowd Movement Guidance Using Causal Inference

Arxiv

0+阅读 · 2021年7月19日

Bayesian Crowdsourcing with Constraints

Arxiv

0+阅读 · 2021年7月16日

Data vs classifiers, who wins?

Arxiv

0+阅读 · 2021年7月16日

Data Poisoning Attacks and Defenses to Crowdsourcing Systems

Arxiv

8+阅读 · 2021年2月18日

Semi-Supervised Learning with Variational Bayesian Inference and Maximum Uncertainty Regularization

Arxiv

4+阅读 · 2020年12月3日

Exploiting Synthetically Generated Data with Semi-Supervised Learning for Small and Imbalanced Datasets

Arxiv

3+阅读 · 2019年3月24日

Classification with Fairness Constraints: A Meta-Algorithm with Provable Guarantees

Classification with Fairness Constraints: A Meta-Algorithm with Provable Guarantees

Arxiv

3+阅读 · 2018年8月2日

微信扫码咨询专知VIP会员