Named Entity Recognition task is one of the core tasks of information extraction.Word ambiguity and word abbreviation are important reasons for the low recognition rate of named entities. In this paper, we propose a novel named entity recognition model WCL-BBCD (Word Contrastive Learning with BERT-BiLSTM-CRF-DBpedia) incorporating the idea of contrastive learning. The model first trains the sentence pairs in the text, calculate similarity between words in sentence pairs by cosine similarity, and fine-tunes the BERT model used for the named entity recognition task through the similarity, so as to alleviate word ambiguity. Then, the fine-tuned BERT model is combined with the BiLSTM-CRF model to perform the named entity recognition task. Finally, the recognition results are corrected in combination with prior knowledge such as knowledge graphs, so as to alleviate the recognition caused by word abbreviations low-rate problem. Experimental results show that our model outperforms other similar model methods on the CoNLL-2003 English dataset and OntoNotes V5 English dataset.
翻译:命名实体识别任务是信息提取的核心任务之一。 错误的模糊和字缩写是被命名实体识别率低的重要原因。 在本文中, 我们提议采用名为新实体识别模型WCL- BBCD(与 BERT- BILSTM- CRF- DBpedia 的 Word Contracting Learning (WCLT- BBCD)(与 BERT- BLSTM- CRF- DBPedia 的 WCC- BBCD (WORD Contractical Learning (Word Contracting) ), 包含对比性学习的概念。 模型首先在文本中训练对句子, 通过相近性计算句子对词词的相似性, 并微调用于指定实体识别任务的 BERT 模型, 以便减少字型的模糊性。 然后, 微调的 BERT模型与 BLSTM- CRF 模型结合, 来完成被命名的实体识别任务。 最后, 与先前的知识( 如知识图表) 来纠正确认结果,, 以便减轻由缩略缩缩小问题引起的识别造成的识别问题。 。 。 实验结果显示我们的模型在 CoNLLLLLLS- 2003 和Ototo Notes V5 Eng Eng Eng Eng 英国数据集 数据 中比 中, 中, 。