Named Entity Recognition systems achieve remarkable performance on domains such as English news. It is natural to ask: What are these models actually learning to achieve this? Are they merely memorizing the names themselves? Or are they capable of interpreting the text and inferring the correct entity type from the linguistic context? We examine these questions by contrasting the performance of several variants of LSTM-CRF architectures for named entity recognition, with some provided only representations of the context as features. We also perform similar experiments for BERT. We find that context representations do contribute to system performance, but that the main factor driving high performance is learning the name tokens themselves. We enlist human annotators to evaluate the feasibility of inferring entity types from the context alone and find that, while people are not able to infer the entity type either for the majority of the errors made by the context-only system, there is some room for improvement. A system should be able to recognize any name in a predictive context correctly and our experiments indicate that current systems may be further improved by such capability.
翻译:命名实体识别系统在英国新闻等领域取得了显著的成绩。 自然会问: 这些模型实际上学到了什么? 这些模型只是自我背诵名称吗? 它们是否能够解释文本并从语言背景中推断出正确的实体类型? 我们研究这些问题时比较了LSTM-CRF结构中用于名称实体识别的若干变体的性能,有些只是提供了对上下文特征的描述。 我们还为BERT进行了类似的实验。 我们发现背景描述确实有助于系统性能,但高性能的主要因素是学习名称符号本身。 我们聘用了人手来评估仅仅从上下文中推断实体类型的可行性,发现虽然人们无法推断该实体类型对于上下文系统的大多数错误来说,但有一些改进的余地。 一个系统应该能够正确识别在预测环境中的任何名称,我们的实验表明,目前的系统可以通过这种能力得到进一步的改进。