Crowdsourcing has emerged as a powerful paradigm for efficiently labeling large datasets and performing various learning tasks, by leveraging crowds of human annotators. When additional information is available about the data, semi-supervised crowdsourcing approaches that enhance the aggregation of labels from human annotators are well motivated. This work deals with semi-supervised crowdsourced classification, under two regimes of semi-supervision: a) label constraints, that provide ground-truth labels for a subset of data; and b) potentially easier to obtain instance-level constraints, that indicate relationships between pairs of data. Bayesian algorithms based on variational inference are developed for each regime, and their quantifiably improved performance, compared to unsupervised crowdsourcing, is analytically and empirically validated on several crowdsourcing datasets.
翻译:众包已成为高效标签大型数据集和履行各种学习任务的一个强有力的范例,它利用了人类批注员的人群。当掌握关于数据的额外信息时,半监督的众包办法具有良好的动力,可以加强从人类批注员那里收集标签的聚合。这项工作涉及半监督的众包分类,在两种半监督制度下:(a) 标签限制,为一组数据提供地面真实标签;以及(b) 可能更容易获得实例层面的限制,表明对数据的关系。根据不同推断为每个系统开发了巴伊西亚算法,其绩效与未经监督的众包相比,在分析和经验上对若干众包数据集进行了定量改进。