Although named entity recognition (NER) helps us to extract various domain-specific entities from text (e.g., artists in the music domain), it is costly to create a large amount of training data or a structured knowledge base to perform accurate NER in the target domain. Here, we propose self-adaptive NER, where the model retrieves the external knowledge from unstructured text to learn the usage of entities that has not been learned well. To retrieve useful knowledge for NER, we design an effective two-stage model that retrieves unstructured knowledge using uncertain entities as queries. Our model first predicts the entities in the input and then finds the entities of which the prediction is not confident. Then, our model retrieves knowledge by using these uncertain entities as queries and concatenates the retrieved text to the original input to revise the prediction. Experiments on CrossNER datasets demonstrated that our model outperforms the strong NERBERT baseline by 2.45 points on average.
翻译:虽然命名为实体识别(NER)有助于我们从文本中提取各种具体领域的实体(例如音乐领域的艺术家),但创建大量培训数据或结构化知识库,以便在目标领域实现准确的净化,成本很高。在这里,我们提出自我调整的净化模型,模型从非结构化文本中检索外部知识,以了解尚未学到的实体的使用情况。为了获取用于净化的有用知识,我们设计了一个有效的两阶段模型,利用不确定实体作为查询,检索无结构化知识。我们的模型首先预测输入的实体,然后发现预测不可靠的实体。然后,我们的模型通过使用这些不确定实体作为查询,将检索到的文本与原始输入相匹配,以修改预测,从而检索知识。CrossNERN数据集的实验表明,我们的模型平均比NERERT的强基准高出245分点。