Crowdsourcing is a relatively economic and efficient solution to collect annotations from the crowd through online platforms. Answers collected from workers with different expertise may be noisy and unreliable, and the quality of annotated data needs to be further maintained. Various solutions have been attempted to obtain high-quality annotations. However, they all assume that workers' label quality is stable over time (always at the same level whenever they conduct the tasks). In practice, workers' attention level changes over time, and the ignorance of which can affect the reliability of the annotations. In this paper, we focus on a novel and realistic crowdsourcing scenario involving attention-aware annotations. We propose a new probabilistic model that takes into account workers' attention to estimate the label quality. Expectation propagation is adopted for efficient Bayesian inference of our model, and a generalized Expectation Maximization algorithm is derived to estimate both the ground truth of all tasks and the label-quality of each individual crowd worker with attention. In addition, the number of tasks best suited for a worker is estimated according to changes in attention. Experiments against related methods on three real-world and one semi-simulated datasets demonstrate that our method quantifies the relationship between workers' attention and label-quality on the given tasks, and improves the aggregated labels.
翻译:通过在线平台收集人群的批注是一个相对经济和有效的解决方案,通过在线平台收集人群的批注。从具有不同专长的工人那里收集的答案可能是吵闹和不可靠的,需要进一步维护附加说明的数据质量。已经尝试了各种解决方案以获得高质量说明。然而,它们都假定工人的标签质量随时间而稳定(在执行任务时始终处于同一水平 ) 。在实践中,工人的注意力水平随时间而变化,其无知会影响批注的可靠性。在本文中,我们侧重于一个新颖和现实的批发设想,其中涉及注意的批注。我们提出了一个新的概率模型,其中考虑到工人对估计标签质量的注意。为高效地推断我们模型的巴伊西亚人采用了预期传播方法,并得出了普遍期望最大化的算法,以估测所有任务的地面真相和每个有关注的人群工人的标签质量。此外,根据关注的变化,估计最适合工人的任务数量。在三个现实世界和一个半模拟的数据质量的标签上对相关方法进行了实验,从而改进了我们给定的标签上的数据质量关系。