Event Detection (ED) is the task of identifying and classifying trigger words of event mentions in text. Despite considerable research efforts in recent years for English text, the task of ED in other languages has been significantly less explored. Switching to non-English languages, important research questions for ED include how well existing ED models perform on different languages, how challenging ED is in other languages, and how well ED knowledge and annotation can be transferred across languages. To answer those questions, it is crucial to obtain multilingual ED datasets that provide consistent event annotation for multiple languages. There exist some multilingual ED datasets; however, they tend to cover a handful of languages and mainly focus on popular ones. Many languages are not covered in existing multilingual ED datasets. In addition, the current datasets are often small and not accessible to the public. To overcome those shortcomings, we introduce a new large-scale multilingual dataset for ED (called MINION) that consistently annotates events for 8 different languages; 5 of them have not been supported by existing multilingual datasets. We also perform extensive experiments and analysis to demonstrate the challenges and transferability of ED across languages in MINION that in all call for more research effort in this area.
翻译:事件探测(ED)是查明和分类文本中提及的事件触发词的任务。尽管近年来为英文文本进行了大量研究,但其他语文的ED任务探索得少得多。向非英语语言转换,对于ED的重要研究问题包括:现有的ED模式在不同语言上表现如何,ED在其他语言上如何富有挑战性,以及ED知识和注释如何在不同语言之间传递;为了回答这些问题,关键是要获得多语言的多语种ED数据集,为多种语言提供一致的事件说明。存在一些多语言的ED数据集;然而,它们往往涵盖少数语言,主要侧重于流行语言。许多语言未被纳入现有的多语言ED数据集。此外,目前的数据集往往很小,而且公众无法查阅。为了克服这些缺陷,我们为ED(所谓的MIRION)引入了一个新的大型多语言数据集,不断说明8种不同语言的事件;其中5个数据集没有得到现有多语言数据集的支持。我们还进行了广泛的实验和分析,以展示在DED的所有领域开展更具有挑战性和可转让性的研究。