Crowdsourcing systems enable us to collect large-scale dataset, but inherently suffer from noisy labels of low-paid workers. We address the inference and learning problems using such a crowdsourced dataset with noise. Due to the nature of sparsity in crowdsourcing, it is critical to exploit both probabilistic model to capture worker prior and neural network to extract task feature despite risks from wrong prior and overfitted feature in practice. We hence establish a neural-powered Bayesian framework, from which we devise deepMF and deepBP with different choice of variational approximation methods, mean field (MF) and belief propagation (BP), respectively. This provides a unified view of existing methods, which are special cases of deepMF with different priors. In addition, our empirical study suggests that deepBP is a new approach, which is more robust against wrong prior, feature overfitting and extreme workers thanks to the more sophisticated BP than MF.
翻译:众包系统使我们能够收集大规模数据集,但自然会受到低收入工人的噪音标签的困扰。我们用这种众包数据集以噪音解决推论和学习问题。由于众包系统过于分散的性质,必须利用概率模型来捕捉工人先前和神经网络,以提取任务特征,尽管在实践上存在先前错误和过度配置特征的风险。因此,我们建立了一个神经动力的贝叶西亚框架,我们从中分别设计出深度MF和深度BP,选择不同的变相近似方法、中位字段和信仰传播(BP)。这为现有方法提供了统一的观点,这是深层MF的特殊情况。此外,我们的实证研究表明,深层BP是一种新方法,由于比MF更复杂的BP,更能应对错误的先前、超装和极端工人。