We propose a simple and practical method for named entity linking (NEL), based on entity representation by multiple embeddings. To explore this method, and to review its dependency on parameters, we measure its performance on Namesakes, a highly challenging dataset of ambiguously named entities. Our observations suggest that the minimal number of mentions required to create a knowledge base (KB) entity is very important for NEL performance. The number of embeddings is less important and can be kept small, within as few as 10 or less. We show that our representations of KB entities can be adjusted using only KB data, and the adjustment can improve NEL performance. We also compare NEL performance of embeddings obtained from tuning language model on diverse news texts as opposed to tuning on more uniform texts from public datasets XSum, CNN / Daily Mail. We found that tuning on diverse news provides better embeddings.
翻译:我们根据多个嵌入器对名称实体的链接(NEL)提出一个简单实用的方法;为了探索这一方法,并审查其对参数的依赖性,我们衡量其在名称不明实体高度具有挑战性的数据集 " 名列奇克 " 上的绩效;我们的意见表明,创建知识库(KB)实体所需的提及数量最少对于NEL的绩效非常重要;嵌入的数量并不那么重要,可以保持在10个或更小的范围内;我们表明,我们KB实体的表述只能使用KB数据进行调整,而调整可以改进NEL的绩效;我们还比较了从调换不同新闻文本的语言模型中获得的嵌入功能,而不是对公共数据集XSum、CNN/每日邮件中更多统一文本的调整。我们发现,对不同新闻的调换提供了更好的嵌入。