Crowdsourcing platforms are often used to collect datasets for training machine learning models, despite higher levels of inaccurate labeling compared to expert labeling. There are two common strategies to manage the impact of such noise. The first involves aggregating redundant annotations, but comes at the expense of labeling substantially fewer examples. Secondly, prior works have also considered using the entire annotation budget to label as many examples as possible and subsequently apply denoising algorithms to implicitly clean the dataset. We find a middle ground and propose an approach which reserves a fraction of annotations to explicitly clean up highly probable error samples to optimize the annotation process. In particular, we allocate a large portion of the labeling budget to form an initial dataset used to train a model. This model is then used to identify specific examples that appear most likely to be incorrect, which we spend the remaining budget to relabel. Experiments across three model variations and four natural language processing tasks show our approach outperforms or matches both label aggregation and advanced denoising methods designed to handle noisy labels when allocated the same finite annotation budget.
翻译:尽管与专家标签相比标签不准确的程度更高,但众包平台往往被用来为培训机器学习模型收集数据集,尽管与专家标签相比,这些模型的标签不准确程度较高。有两种共同的战略来管理这种噪音的影响。第一个是汇总多余的注释,但以少得多的例子作标签为代价。第二,以前的工作还考虑使用整个注解预算来贴上尽可能多的例子,然后应用解译算法来隐含地清理数据集。我们找到一个中间点,并提议一种方法,即保留部分注释,以明确清除极有可能发生错误的样本,优化批注过程。特别是,我们分配了很大一部分标签预算来组成用于培训模型的初始数据集。然后,这个模型用来找出最有可能不正确的具体例子,我们用其余的预算重新贴标签。在三个模型变异异和四个自然语言处理任务上进行的实验显示了我们的方法外形,或者匹配了标签汇总和在分配同一有限注注时设计处理噪音标签的先进除法。