Disagreement in natural language annotation has mostly been studied from a perspective of biases introduced by the annotators and the annotation frameworks. Here, we propose to analyze another source of bias: task design bias, which has a particularly strong impact on crowdsourced linguistic annotations where natural language is used to elicit the interpretation of laymen annotators. For this purpose we look at implicit discourse relation annotation, a task that has repeatedly been shown to be difficult due to the relations' ambiguity. We compare the annotations of 1,200 discourse relations obtained using two distinct annotation tasks and quantify the biases of both methods across four different domains. Both methods are natural language annotation tasks designed for crowdsourcing. We show that the task design can push annotators towards certain relations and that some discourse relations senses can be better elicited with one or the other annotation approach. We also conclude that this type of bias should be taken into account when training and testing models.
翻译:自然语言注释中的不一致性研究主要从注释者和注释框架引入偏见的角度进行。本文提出了一种分析另一种偏见来源的方法:任务设计偏见。对于众包语言注释任务,任务设计偏见对于自然语言用于引导普通注释者的解释具有特别强的影响。为此,我们研究了隐含语篇关系标注,这是一个由于歧义性而一直被证明是困难的任务。我们比较了使用两种不同注释任务获得的1,200个语篇关系的注释,并量化了两种方法在四个不同领域中的偏见。两种方法都是针对众包设计的自然语言注释任务。我们表明,任务设计可以推动注释者向某些关系倾斜,而某些语篇关系意义可以用一种或另一种注释方法更好地引导。我们还得出结论,在训练和测试模型时应考虑这种类型的偏差。