Samples with ground truth labels may not always be available in numerous domains. While learning from crowdsourcing labels has been explored, existing models can still fail in the presence of sparse, unreliable, or diverging annotations. Co-teaching methods have shown promising improvements for computer vision problems with noisy labels by employing two classifiers trained on each others' confident samples in each batch. Inspired by the idea of separating confident and uncertain samples during the training process, we extend it for the crowdsourcing problem. Our model, CrowdTeacher, uses the idea that perturbation in the input space model can improve the robustness of the classifier for noisy labels. Treating crowdsourcing annotations as a source of noisy labeling, we perturb samples based on the certainty from the aggregated annotations. The perturbed samples are fed to a Co-teaching algorithm tuned to also accommodate smaller tabular data. We showcase the boost in predictive power attained using CrowdTeacher for both synthetic and real datasets across various label density settings. Our experiments reveal that our proposed approach beats baselines modeling individual annotations and then combining them, methods simultaneously learning a classifier and inferring truth labels, and the Co-teaching algorithm with aggregated labels through common truth inference methods.
翻译:地面真实标签的样本可能并非总能在许多领域找到。 在从众包标签中学习的同时, 现有的模型可能仍然在缺乏、 不可靠或不同说明的情况下无法成功。 共同教学方法显示,通过在每批中使用经过培训的对彼此自信的样本,对计算机视觉问题有了很有希望的改进。 受在培训过程中将自信和不确定的样本分开的想法的启发, 我们推广到众包问题。 我们的模型, Crowteacher, 使用输入空间模型的扰动可以改善分类器对噪音标签的坚固性。 将众包说明作为噪音标签的来源处理, 我们根据综合说明的确定性对杂乱的标签进行检查。 受扰动的样本被喂给一个共同教学的算法, 以适应较小的表格数据。 我们展示了在各种标签密度环境中使用CrowTeacher获得的预测能力。 我们的实验显示, 我们提出的方法比基线个人描述模型, 然后再将其合并为个人描述, 并同时通过共同的标签和推算方法学习真相。