Crowdsourcing has been widely used to efficiently obtain labeled datasets for supervised learning from large numbers of human resources at low cost. However, one of the technical challenges in obtaining high-quality results from crowdsourcing is dealing with the variability and bias caused by the fact that it is humans execute the work, and various studies have addressed this issue to improve the quality by integrating redundantly collected responses. In this study, we focus on the observation bias in crowdsourcing. Variations in the frequency of worker responses and the complexity of tasks occur, which may affect the aggregation results when they are correlated with the quality of the responses. We also propose statistical aggregation methods for crowdsourcing responses that are combined with an observational data bias removal method used in causal inference. Through experiments using both synthetic and real datasets with/without artificially injected spam and colluding workers, we verify that the proposed method improves the aggregation accuracy in the presence of strong observation biases and robustness to both spam and colluding workers.
翻译:众包被广泛用于高效率地从大量人力资源中以低成本获得有标签的数据集,以便监督地从大量人力资源中学习;然而,从众包中获得高质量成果的技术挑战之一是处理由人为执行这项工作造成的变异性和偏向性,各种研究已经解决这个问题,通过将重复收集的答复结合起来来提高质量。在本研究中,我们侧重于众包中的观察偏差。工人反应的频率和任务的复杂性存在差异,如果与答复的质量相关,可能会影响汇总结果。我们还提出了众包反应的统计汇总方法,结合在因果推断中使用的观察数据偏差去除方法。通过使用合成和真实数据集,与/不使用人工注射垃圾邮件和串通工人进行实验,我们核实拟议方法在对垃圾桶和串通工人存在强烈的观察偏差和稳健的情况下提高了汇总准确性。</s>