Named Entity Recognition is the task to locate and classify the entities in the text. However, Unlabeled Entity Problem in NER datasets seriously hinders the improvement of NER performance. This paper proposes SCL-RAI to cope with this problem. Firstly, we decrease the distance of span representations with the same label while increasing it for different ones via span-based contrastive learning, which relieves the ambiguity among entities and improves the robustness of the model over unlabeled entities. Then we propose retrieval augmented inference to mitigate the decision boundary shifting problem. Our method significantly outperforms the previous SOTA method by 4.21% and 8.64% F1-score on two real-world datasets.
翻译:命名实体识别是查找和分类文本中实体的任务。 但是, NER 数据集中的未标实体问题严重阻碍了 NER 性能的改善。 本文建议SCL- RAI 来解决这个问题。 首先, 我们减少使用同一标签的跨区代表距离, 同时通过基于跨区域对比学习来增加不同标签的跨区代表距离, 从而减轻实体之间的模糊性, 提高模型相对于未标实体的稳健性 。 然后我们提议检索增加推论, 以缓解决定边界移动问题 。 我们的方法在两个真实世界数据集上大大优于先前的 SOTA 方法 4. 21% 和 8. 64 % F1 核心 。