The MultiCoNER shared task aims at detecting semantically ambiguous and complex named entities in short and low-context settings for multiple languages. The lack of contexts makes the recognition of ambiguous named entities challenging. To alleviate this issue, our team DAMO-NLP proposes a knowledge-based system, where we build a multilingual knowledge base based on Wikipedia to provide related context information to the named entity recognition (NER) model. Given an input sentence, our system effectively retrieves related contexts from the knowledge base. The original input sentences are then augmented with such context information, allowing significantly better contextualized token representations to be captured. Our system wins 10 out of 13 tracks in the MultiCoNER shared task.
翻译:多伙伴网络共同的任务旨在探测在短、低文本环境中的短期和低文本环境中的复杂名称实体。缺乏背景使得对模糊名称实体的承认具有挑战性。为了缓解这一问题,我们的DAMO-NLP团队建议建立一个知识型系统,在维基百科的基础上建立一个多语种知识库,为名称实体识别模式提供相关背景信息。根据输入句,我们的系统从知识库中有效地检索了相关背景。然后,原始输入句增加了这种背景信息,从而可以捕捉到更符合背景的象征性表述。我们的系统在多伙伴网络的13个轨道中赢得了10个轨道的共享任务。