Article prediction is a task that has long defied accurate linguistic description. As such, this task is ideally suited to evaluate models on their ability to emulate native-speaker intuition. To this end, we compare the performance of native English speakers and pre-trained models on the task of article prediction set up as a three way choice (a/an, the, zero). Our experiments with BERT show that BERT outperforms humans on this task across all articles. In particular, BERT is far superior to humans at detecting the zero article, possibly because we insert them using rules that the deep neural model can easily pick up. More interestingly, we find that BERT tends to agree more with annotators than with the corpus when inter-annotator agreement is high but switches to agreeing more with the corpus as inter-annotator agreement drops. We contend that this alignment with annotators, despite being trained on the corpus, suggests that BERT is not memorising article use, but captures a high level generalisation of article use akin to human intuition.
翻译:文章的预测是一项长期无法准确描述语言的任务。 因此, 这项任务非常适合评估其模仿当地人直觉的能力模型。 为此, 我们比较了本地英语演讲人的表现和作为三种选择( a/an, the, zero)设置的文章预测任务预培训模型。 我们与BERT的实验表明, BERT在所有条款中都比人类更出色。 特别是, BERT在检测零文章时比人类优越得多, 可能是因为我们使用深神经模型可以容易地收集的规则插入它们。 更有趣的是, 我们发现, 当跨专家协议高时, BERT 倾向于与说明者达成更多的一致, 而不是与声明者达成更多的一致, 作为跨专家协议的放弃。 我们争论说, 与说明者之间的这种一致,尽管已经接受过关于该研究的训练, 表明BERT并不令人想起文章的使用, 而是捕捉到与人类直觉相近的高度的描述。