In entity linking, mentions of named entities in raw text are disambiguated against a knowledge base (KB). This work focuses on linking to unseen KBs that do not have training data and whose schema is unknown during training. Our approach relies on methods to flexibly convert entities from arbitrary KBs with several attribute-value pairs into flat strings, which we use in conjunction with state-of-the-art models for zero-shot linking. To improve the generalization of our model, we use two regularization schemes based on shuffling of entity attributes and handling of unseen attributes. Experiments on English datasets where models are trained on the CoNLL dataset, and tested on the TAC-KBP 2010 dataset show that our models outperform baseline models by over 12 points of accuracy. Unlike prior work, our approach also allows for seamlessly combining multiple training datasets. We test this ability by adding both a completely different dataset (Wikia), as well as increasing amount of training data from the TAC-KBP 2010 training set. Our models perform favorably across the board.
翻译:在实体链接中,在原始文本中提及名称实体与知识库(KB)脱节。这项工作侧重于与没有培训数据且培训期间尚不清楚的无形KB连接。我们的方法依赖于将实体从任意的KB和几个属性-价值配对灵活转换成平坦的字符串的方法,我们与最先进的零光连接模型一起使用这些方法。为了改进模型的普及,我们使用两种基于实体属性的打乱和对无形属性的处理的正规化计划。在英国数据集上进行的实验显示,我们的模型比基线模型的精确度高出12个百分点。我们的方法还允许将多个培训数据集无缝地结合。我们通过添加完全不同的数据集(Wikia)来测试这一能力,并增加2010年TAC-KBP培训集的培训数据。我们的模型在全局上表现良好。