We consider the problem of cost-optimal utilization of a crowdsourcing platform for binary, unsupervised classification of a collection of items, given a prescribed error threshold. Workers on the crowdsourcing platform are assumed to be divided into multiple classes, based on their skill, experience, and/or past performance. We model each worker class via an unknown confusion matrix, and a (known) price to be paid per label prediction. For this setting, we propose algorithms for acquiring label predictions from workers, and for inferring the true labels of items. We prove that if the number of (unlabeled) items available is large enough, our algorithms satisfy the prescribed error thresholds, incurring a cost that is near-optimal. Finally, we validate our algorithms, and some heuristics inspired by them, through an extensive case study.
翻译:我们考虑了以成本最佳的方式利用众包平台进行二进制、不受监督的物品分类的问题,根据一个规定的错误阈值。 众包平台上的工人根据他们的技能、经验和/或过去的表现,被假定分为多类。 我们通过未知的混乱矩阵和每个标签预测要支付的(已知的)价格来模拟每个工人阶级。 对于这一设置,我们提出了从工人那里获取标签预测和推断物品真实标签的算法。 我们证明,如果(未贴标签的)物品数量足够大,我们的算法就满足了规定的错误阈值,产生接近最佳的成本。 最后,我们通过广泛的案例研究验证了我们的算法和一些受这些算法启发的超自然理论。