Injecting external domain-specific knowledge (e.g., UMLS) into pretrained language models (LMs) advances their capability to handle specialised in-domain tasks such as biomedical entity linking (BEL). However, such abundant expert knowledge is available only for a handful of languages (e.g., English). In this work, by proposing a novel cross-lingual biomedical entity linking task (XL-BEL) and establishing a new XL-BEL benchmark spanning 10 typologically diverse languages, we first investigate the ability of standard knowledge-agnostic as well as knowledge-enhanced monolingual and multilingual LMs beyond the standard monolingual English BEL task. The scores indicate large gaps to English performance. We then address the challenge of transferring domain-specific knowledge in resource-rich languages to resource-poor ones. To this end, we propose and evaluate a series of cross-lingual transfer methods for the XL-BEL task, and demonstrate that general-domain bitext helps propagate the available English knowledge to languages with little to no in-domain data. Remarkably, we show that our proposed domain-specific transfer methods yield consistent gains across all target languages, sometimes up to 20 Precision@1 points, without any in-domain knowledge in the target language, and without any in-domain parallel data.
翻译:将外部特定领域知识(如UMLS)注入经过预先培训的语言模式(LMS),提高了他们处理生物医学实体连接(BEL)等专业领域任务的能力。然而,这种丰富的专家知识只提供给少数语言(如英语),在这项工作中,我们提出一个新的跨语言生物医学实体(XL-BEL)连接任务(XL-BEL),并建立一个涵盖10种类型多样语言的新的XL-BEL基准,我们首先调查标准知识-认知以及知识-强化单语和多语言LMs的能力,超越标准的单语BEL单语任务。分数表明英语业绩存在巨大差距。我们然后处理将资源丰富语言的特定领域知识转让给资源贫乏语言的挑战。为此,我们提出并评价一系列跨语言连接任务(XL-BEL)的跨语言传输方法,并表明一般的位点点有助于将现有英语知识传播到几乎没有数据的任何语言。值得注意的是,我们提议的域-具体转移方法在目标语言中,有时在任何目标语言中产生一致的成果。