在众包中采用轻量、有效、高效的拉贝尔聚合模式 (A Light-weight, Effective and Efficient Model for Label Aggregation in Crowdsourcing)

Due to the noises in crowdsourced labels, label aggregation (LA) has emerged as a standard procedure to post-process crowdsourced labels. LA methods estimate true labels from crowdsourced labels by modeling worker qualities. Most existing LA methods are iterative in nature. They need to traverse all the crowdsourced labels multiple times in order to jointly and iteratively update true labels and worker qualities until convergence. Consequently, these methods have high space and time complexities. In this paper, we treat LA as a dynamic system and model it as a Dynamic Bayesian network. From the dynamic model we derive two light-weight algorithms, LA\textsuperscript{onepass} and LA\textsuperscript{twopass}, which can effectively and efficiently estimate worker qualities and true labels by traversing all the labels at most twice. Due to the dynamic nature, the proposed algorithms can also estimate true labels online without re-visiting historical data. We theoretically prove the convergence property of the proposed algorithms, and bound the error of estimated worker qualities. We also analyze the space and time complexities of the proposed algorithms and show that they are equivalent to those of majority voting. Experiments conducted on 20 real-world datasets demonstrate that the proposed algorithms can effectively and efficiently aggregate labels in both offline and online settings even if they traverse all the labels at most twice.

翻译：由于多方联动标签的噪音,标签汇总(LA)已成为处理后多方联动标签的标准程序。 LA方法通过模拟工人素质来估计来自多方联动标签的真正标签。大部分现有的LA方法具有迭接性质。它们需要多次翻转所有多方联动标签, 以便联合和迭代更新真实标签和工人素质, 直至趋同。因此, 这些方法的空间和时间复杂度很高。在本文中, 我们把LA当作一个动态系统, 并把它作为动态的Bayesian网络的模型。我们从动态模型中得出两种轻量算法, 即LA\ textsuperscrat{ onepass} 和LA\ textsupersuperscript{2pass} 。这两种方法可以有效和高效地评估工人素质和真实标签。由于动态性质, 拟议的算法还可以在不重访历史数据的情况下估算在线真正标签。我们理论上证明拟议算法的趋同性, 并约束了估计工人素质的错误。我们还分析了在拟议的20个在线分类中, 的多数和整个标签中, 都能够有效地显示它们所拟议的多数和整个标签的实验性等值。