We present a task of multilingual linking of events to a knowledge base. We automatically compile a large-scale dataset for this task, comprising of 1.8M mentions across 44 languages referring to over 10.9K events from Wikidata. We propose two variants of the event linking task: 1) multilingual, where event descriptions are from the same language as the mention, and 2) crosslingual, where all event descriptions are in English. On the two proposed tasks, we compare multiple event linking systems including BM25+ (Lv and Zhai, 2011) and multilingual adaptations of the biencoder and crossencoder architectures from BLINK (Wu et al., 2020). In our experiments on the two task variants, we find both biencoder and crossencoder models significantly outperform the BM25+ baseline. Our results also indicate that the crosslingual task is in general more challenging than the multilingual task. To test the out-of-domain generalization of the proposed linking systems, we additionally create a Wikinews-based evaluation set. We present qualitative analysis highlighting various aspects captured by the proposed dataset, including the need for temporal reasoning over context and tackling diverse event descriptions across languages.
翻译:我们提出了将事件与知识库多语种连接的任务。我们为此任务自动汇编了一个大型数据集,由来自维基数据(Wikigata)的44种语言组成的18M提到超过10.9K事件。我们建议了该活动连接任务的两个变体:1)多语种,事件描述来自与提及相同的语言,2)跨语言,所有事件描述都使用英语。关于这两项拟议任务,我们比较了多重事件连接系统,包括BM25+(Lv和Zhai,2011年)和BLINK(Wu等人,2020年)的双电码和交叉码结构的多语种适应。我们在两个任务变体的实验中发现,双电码和交叉码模式都大大超越了BM25+的基线。我们的结果还表明,跨语言任务一般而言比多语种任务更具挑战性。测试拟议连接系统的外部概括性,我们还创建了一个基于Wikinews的评估集。我们提出了定性分析,强调拟议数据集所收集的各个方面,包括跨时间推理和跨背景解释的需要。