State-of-the-art pre-trained models have been shown to memorise facts and perform well with limited amounts of training data. To gain a better understanding of how these models learn, we study their generalisation and memorisation capabilities in noisy and low-resource scenarios. We find that the training of these models is almost unaffected by label noise and that it is possible to reach near-optimal performances even on extremely noisy datasets. Conversely, we also find that they completely fail when tested on low-resource tasks such as few-shot learning and rare entity recognition. To mitigate such limitations, we propose a novel architecture based on BERT and prototypical networks that improves performance in low-resource named entity recognition tasks.
翻译:为了更好地了解这些模型是如何学习的,我们研究了这些模型在吵闹和低资源情况下的概括和记忆能力。我们发现,这些模型的培训几乎不受标签噪音的影响,甚至有可能在极为吵闹的数据集上达到接近最佳的性能。相反,我们也发现,在测试低资源任务时,这些模型是完全失败的,这些低资源任务包括少见的学习和稀有的实体识别。为了减少这些局限性,我们提议了以BERT和原型网络为基础的新结构,以提高低资源实体识别任务的业绩。