Cross-lingual entity linking (XEL) is the task of finding referents in a target-language knowledge base (KB) for mentions extracted from source-language texts. The first step of (X)EL is candidate generation, which retrieves a list of plausible candidate entities from the target-language KB for each mention. Approaches based on resources from Wikipedia have proven successful in the realm of relatively high-resource languages (HRL), but these do not extend well to low-resource languages (LRL) with few, if any, Wikipedia pages. Recently, transfer learning methods have been shown to reduce the demand for resources in the LRL by utilizing resources in closely-related languages, but the performance still lags far behind their high-resource counterparts. In this paper, we first assess the problems faced by current entity candidate generation methods for low-resource XEL, then propose three improvements that (1) reduce the disconnect between entity mentions and KB entries, and (2) improve the robustness of the model to low-resource scenarios. The methods are simple, but effective: we experiment with our approach on seven XEL datasets and find that they yield an average gain of 16.9% in Top-30 gold candidate recall, compared to state-of-the-art baselines. Our improved model also yields an average gain of 7.9% in in-KB accuracy of end-to-end XEL.
翻译:跨语言实体链接( XEL) 的任务是在目标语言知识库中找到参考文献,以便从源语言文本中引用。 (X)EL的第一步是候选人生成,从目标语言KB中获取一份可信的候选实体清单,每个提及对象语言KB。基于维基百科资源的方法在资源相对较高的语言(HRL)领域证明是成功的,但这些方法并没有很好地扩大到资源较少(如果有的话)维基百科页面的低资源语言(LLLL)。最近,转移学习方法已证明通过使用与资源密切相关的语文来减少对LEL资源的需求,但是其绩效仍然远远落后于其高资源对应方。在本文件中,我们首先评估当前实体候选人生成方法在低资源 XEL 方面所面临的问题,然后提出三项改进:(1) 减少实体与KB 条目之间的脱节,以及(2) 改进模式对低资源情景的稳健性。方法简单但有效:我们尝试了7 XEL 数据集,并发现其绩效远远落后于其高资源对应方的成绩。我们从X- ASimal asinal as- asinal asinalalal asinal asalal asinal asinal asal asinal asinalal 16.9-xinal legal legal legal legal-legal-xxxxxxxxxxxxxxxx16.