Human data labeling is an important and expensive task at the heart of supervised learning systems. Hierarchies help humans understand and organize concepts. We ask whether and how concept hierarchies can inform the design of annotation interfaces to improve labeling quality and efficiency. We study this question through annotation of vaccine misinformation, where the labeling task is difficult and highly subjective. We investigate 6 user interface designs for crowdsourcing hierarchical labels by collecting over 18,000 individual annotations. Under a fixed budget, integrating hierarchies into the design improves crowdsource workers' F1 scores. We attribute this to (1) Grouping similar concepts, improving F1 scores by +0.16 over random groupings, (2) Strong relative performance on high-difficulty examples (relative F1 score difference of +0.40), and (3) Filtering out obvious negatives, increasing precision by +0.07. Ultimately, labeling schemes integrating the hierarchy outperform those that do not - achieving mean F1 of 0.70.
翻译:人类数据标签是受监督的学习系统的核心重要而昂贵的任务。 等级制度有助于人类理解和组织概念。 我们询问概念等级制度是否以及如何为说明界面的设计提供参考,以提高标签质量和效率。 我们通过批注疫苗错误信息来研究这一问题,在标签任务困难和高度主观性很强的情况下,我们通过收集18 000多个个人说明来调查众包等级标签的6个用户界面设计。 在固定预算下,将等级制度纳入设计可以改善众源工人的F1分数。 我们将此归因于:(1) 将类似概念分组,将F1分比随机组增加0.16;(2) 高难度例子的相对性能强(相对F1分数差异为+0.40),(3) 过滤明显的负数,提高精确度+0.07。 最终, 将等级制度整合超过非典型的F1,达到0.70的平均值。