We show that large pre-trained language models are extremely capable of identifying label errors in datasets: simply verifying data points in descending order of out-of-distribution loss significantly outperforms more complex mechanisms for detecting label errors on natural language datasets. We contribute a novel method to produce highly realistic, human-originated label noise from crowdsourced data, and demonstrate the effectiveness of this method on TweetNLP, providing an otherwise difficult to obtain measure of realistic recall.
翻译:我们发现,大型经过培训的语文模型非常能够识别数据集中的标签错误:简单地按照分配外损失的降序来核查数据点,大大超过探测自然语言数据集标签错误的更复杂机制。 我们贡献了一种新颖的方法,从众源数据中产生高度现实的、人为的标签噪音,并展示了TweetNLP的这一方法的有效性,否则很难获得现实的回忆。