In real-world data labeling applications, annotators often provide imperfect labels. It is thus common to employ multiple annotators to label data with some overlap between their examples. We study active learning in such settings, aiming to train an accurate classifier by collecting a dataset with the fewest total annotations. Here we propose ActiveLab, a practical method to decide what to label next that works with any classifier model and can be used in pool-based batch active learning with one or multiple annotators. ActiveLab automatically estimates when it is more informative to re-label examples vs. labeling entirely new ones. This is a key aspect of producing high quality labels and trained models within a limited annotation budget. In experiments on image and tabular data, ActiveLab reliably trains more accurate classifiers with far fewer annotations than a wide variety of popular active learning methods.
翻译:在真实世界数据标签应用程序中,批注者往往提供不完善的标签。 因此,使用多个批注者给数据贴上标签,使其实例之间有一些重叠是常见的。 我们研究在这样的环境中积极学习,通过收集数据集,收集最少总说明来培训准确的分类者。 这里我们提议了“ ApentLab”这个实用方法,用以决定下一个标签与任何分类模型一起工作,并可用于与一个或多个批注者一起积极学习的以池为基础的批次中。 “PenterLab”当重新标签实例比完全新的条目贴上标签的信息更多时自动估算。 这是在有限的批注预算内制作高质量标签和经过培训的模型的一个重要方面。 在图像和表格数据实验中,“FenteralLab”可靠地培训了比广泛流行的积极学习方法更少说明的更准确的分类者。