解决在线众包批注中监督不足和监督不力的问题 (Towards Mitigating the Problem of Insufficient and Ambiguous Supervision in Online Crowdsourcing Annotation)

In real-world crowdsourcing annotation systems, due to differences in user knowledge and cultural backgrounds, as well as the high cost of acquiring annotation information, the supervision information we obtain might be insufficient and ambiguous. To mitigate the negative impacts, in this paper, we investigate a more general and broadly applicable learning problem, i.e. \emph{semi-supervised partial label learning}, and propose a novel method based on pseudo-labeling and contrastive learning. Following the key inventing principle, our method facilitate the partial label disambiguation process with unlabeled data and at the same time assign reliable pseudo-labels to weakly supervised examples. Specifically, our method learns from the ambiguous labeling information via partial cross-entropy loss. Meanwhile, high-accuracy pseudo-labels are generated for both partial and unlabeled examples through confidence-based thresholding and contrastive learning is performed in a hybrid unsupervised and supervised manner for more discriminative representations, while its supervision increases curriculumly. The two main components systematically work as a whole and reciprocate each other. In experiments, our method consistently outperforms all comparing methods by a significant margin and set up the first state-of-the-art performance for semi-supervised partial label learning on image benchmarks.

翻译：在现实世界的众包批注系统中,由于用户知识和文化背景的差异,以及获取批注信息的高昂成本,我们获得的监督信息可能不够充分且含混不清。为了减轻负面影响,我们在本文件中调查了一个更加普遍和广泛适用的学习问题,即:emph{semi-监督的局部标签学习 },并提出一种基于假标签和对比学习的新颖方法。根据关键的发明原则,我们的方法为部分标签模糊的过程提供了便利,其中含有未贴标签的数据,同时指定了可靠的假标签作为受监管的薄弱实例。具体地说,我们的方法通过部分跨热带损失从模糊标签信息中学习。与此同时,通过基于信任的门槛和对比学习,为部分和未贴标签的范例制作了高度精确的假标签,同时,以混合、不统一和监督的方式进行,以更具有歧视性的表述,同时其监督增加了课程设置。两个主要组成部分系统进行整体的工作,并相互重复对相互监督的实例进行。具体地说,我们的方法通过实验,通过一个显著的跨比值来对比整个阶段的图像。