We raise and define a new crowdsourcing scenario, open set crowdsourcing, where we only know the general theme of an unfamiliar crowdsourcing project, and we don't know its label space, that is, the set of possible labels. This is still a task annotating problem, but the unfamiliarity with the tasks and the label space hampers the modelling of the task and of workers, and also the truth inference. We propose an intuitive solution, OSCrowd. First, OSCrowd integrates crowd theme related datasets into a large source domain to facilitate partial transfer learning to approximate the label space inference of these tasks. Next, it assigns weights to each source domain based on category correlation. After this, it uses multiple-source open set transfer learning to model crowd tasks and assign possible annotations. The label space and annotations given by transfer learning will be used to guide and standardize crowd workers' annotations. We validate OSCrowd in an online scenario, and prove that OSCrowd solves the open set crowdsourcing problem, works better than related crowdsourcing solutions.
翻译:我们提出并定义一个新的众包方案, 开放的众包方案, 我们只知道一个不熟悉的众包项目的总主题, 而我们不知道它的标签空间, 也就是一组可能的标签。 这仍然是一个任务批注问题, 但对于任务和标签空间的不熟悉性会妨碍任务和工人的建模, 以及事实推论。 我们提出了一个直观的解决方案, OSSCrowd 。 首先, OSCrowd 将人群主题相关数据集整合到一个大源域, 以便利部分传输学习, 以近似这些任务的标签空间 。 接下来, 它根据类别关联性为每个源域分配权重 。 在此之后, 它使用多源开源的开放式转移学习来模拟人群任务和可能的说明 。 转移学习提供的标签空间和说明将被用于引导和规范人群工人的描述 。 我们验证 OSCrowd 在网络场景中, 并证明 OSCrowd 解决了开放的众包问题, 比相关的众包解决方案要好 。