通过Meta-Llearn Learn 有效建立很少点命名的实体链接 (Effective Few-Shot Named Entity Linking by Meta-Learning)

Entity linking aims to link ambiguous mentions to their corresponding entities in a knowledge base, which is significant and fundamental for various downstream applications, e.g., knowledge base completion, question answering, and information extraction. While great efforts have been devoted to this task, most of these studies follow the assumption that large-scale labeled data is available. However, when the labeled data is insufficient for specific domains due to labor-intensive annotation work, the performance of existing algorithms will suffer an intolerable decline. In this paper, we endeavor to solve the problem of few-shot entity linking, which only requires a minimal amount of in-domain labeled data and is more practical in real situations. Specifically, we firstly propose a novel weak supervision strategy to generate non-trivial synthetic entity-mention pairs based on mention rewriting. Since the quality of the synthetic data has a critical impact on effective model training, we further design a meta-learning mechanism to assign different weights to each synthetic entity-mention pair automatically. Through this way, we can profoundly exploit rich and precious semantic information to derive a well-trained entity linking model under the few-shot setting. The experiments on real-world datasets show that the proposed method can extensively improve the state-of-the-art few-shot entity linking model and achieve impressive performance when only a small amount of labeled data is available. Moreover, we also demonstrate the outstanding ability of the model's transferability.

翻译：在知识库中,与相应实体进行模棱两可的联系,这是一个重要且对于各种下游应用来说至关重要的知识库,例如知识库的完成、回答问题和信息提取等,这是一个重要且至关重要的知识库。虽然已经为此任务付出了巨大努力,但大多数这些研究都遵循了大规模标签数据存在这一假设,然而,如果由于劳动密集型说明工作,标签数据不足以用于特定领域,则现有算法的性能将受到不可容忍的下降的影响。在本文件中,我们努力解决微弱的实体连接问题,这只需要最低限度的内置标签数据,在现实情况下更加实用。具体地说,我们首先提出了一个新的薄弱监督战略,以产生非三重合成的合成实体配对。由于合成数据的质量对有效的模型培训有重要影响,我们进一步设计一个元学习机制,对每个合成实体的模型的性能自动给予不同的分量。通过这一方式,我们可以深入利用丰富和宝贵的语义信息来获得一个经过良好训练的模型,在少数情况下更切合实际。具体地说,我们提出一个新的薄弱的监督战略,在提到重新写作文章时,将实际数据数量联系起来。我们所拟议的数据库的实验能够实现真正的数据结构。