Annotation studies often require annotators to familiarize themselves with the task, its annotation scheme, and the data domain. This can be overwhelming in the beginning, mentally taxing, and induce errors into the resulting annotations; especially in citizen science or crowd sourcing scenarios where domain expertise is not required and only annotation guidelines are provided. To alleviate these issues, we propose annotation curricula, a novel approach to implicitly train annotators. Our goal is to gradually introduce annotators into the task by ordering instances that are annotated according to a learning curriculum. To do so, we first formalize annotation curricula for sentence- and paragraph-level annotation tasks, define an ordering strategy, and identify well-performing heuristics and interactively trained models on three existing English datasets. We then conduct a user study with 40 voluntary participants who are asked to identify the most fitting misconception for English tweets about the Covid-19 pandemic. Our results show that using a simple heuristic to order instances can already significantly reduce the total annotation time while preserving a high annotation quality. Annotation curricula thus can provide a novel way to improve data collection. To facilitate future research, we further share our code and data consisting of 2,400 annotations.
翻译:为缓解这些问题,我们提议了批注课程,这是一种隐含培训说明员的新办法。我们的目标是通过根据学习课程订购附加说明的事例,逐步在任务中引入批注员。为了做到这一点,我们首先将判决和段落级批注任务的批注课程正规化,确定排序战略,并找出三个现有英国数据集的良好超常和互动培训模式。我们然后与40名自愿参与者进行用户研究,请他们查明有关Covid-19大流行的英语推文最恰当的错误。我们的结果显示,使用简单的超常来排序实例可以大大缩短批注时间,同时保持较高的批注质量。因此,批注课程可以提供改进数据收集的新颖的方法。