Current state-of-the-art approaches for named entity recognition (NER) using BERT-style transformers typically use one of two different approaches: (1) The first fine-tunes the transformer itself on the NER task and adds only a simple linear layer for word-level predictions. (2) The second uses the transformer only to provide features to a standard LSTM-CRF sequence labeling architecture and thus performs no fine-tuning. In this paper, we perform a comparative analysis of both approaches in a variety of settings currently considered in the literature. In particular, we evaluate how well they work when document-level features are leveraged. Our evaluation on the classic CoNLL benchmark datasets for 4 languages shows that document-level features significantly improve NER quality and that fine-tuning generally outperforms the feature-based approaches. We present recommendations for parameters as well as several new state-of-the-art numbers. Our approach is integrated into the Flair framework to facilitate reproduction of our experiments.
翻译:使用BERT式变压器进行名称实体识别(NER)的当前最新方法通常使用两种不同方法之一:(1) 对变压器本身的NER任务进行第一次微调,只增加一个简单的线性层,用于字级预测。(2) 第二种变压器仅提供标准的 LSTM-CRF 序列标签结构的特征,因此不进行微调。在本文中,我们对两种方法在文献中目前考虑的各种环境中进行比较分析,特别是,我们评估在利用文件级特征时它们如何运作。我们对典型的CONLL基准数据集的评价表明,文件级的NER质量显著提高,微调通常优于基于特征的方法。我们提出了参数建议以及若干新的最新数字。我们的方法被纳入了Flair框架,以促进我们实验的复制。