Named entity recognition (NER) is the task to detect and classify the entity spans in the text. When entity spans overlap between each other, this problem is named as nested NER. Span-based methods have been widely used to tackle the nested NER. Most of these methods will get a score $n \times n$ matrix, where $n$ means the length of sentence, and each entry corresponds to a span. However, previous work ignores spatial relations in the score matrix. In this paper, we propose using Convolutional Neural Network (CNN) to model these spatial relations in the score matrix. Despite being simple, experiments in three commonly used nested NER datasets show that our model surpasses several recently proposed methods with the same pre-trained encoders. Further analysis shows that using CNN can help the model find nested entities more accurately. Besides, we found that different papers used different sentence tokenizations for the three nested NER datasets, which will influence the comparison. Thus, we release a pre-processing script to facilitate future comparison.
翻译:命名实体识别( NER) 是在文本中检测和分类实体的任务。 当实体相互重叠时, 这个问题被命名为巢式 NER 。 基于 Span 的方法已被广泛用于处理巢式 NER 。 大多数这些方法将获得一个分数 $\ times n$ 矩阵, 其中美元代表刑期长度, 而每个条目对应一个宽度 。 但是, 先前的工作忽略了得分矩阵中的空间关系 。 在本文中, 我们提议使用 Convolutional Neural 网络( CNN) 来模拟得分矩阵中的这些空间关系 。 尽管很简单, 在三种常用的巢式 NER 数据集中进行的实验表明, 我们的模型已经超过最近提出的数种方法, 使用相同的预先训练的编码。 进一步的分析显示, 使用CNN能够帮助模型更准确地查找嵌式实体 。 此外, 我们发现不同的文件对三个嵌式 NER 数据集使用了不同的句号代号, 这会影响比较 。 因此, 我们发布一个预处理脚本, 以便利未来的比较 。