Most existing event extraction (EE) methods merely extract event arguments within the sentence scope. However, such sentence-level EE methods struggle to handle soaring amounts of documents from emerging applications, such as finance, legislation, health, etc., where event arguments always scatter across different sentences, and even multiple such event mentions frequently co-exist in the same document. To address these challenges, we propose a novel end-to-end model, Doc2EDAG, which can generate an entity-based directed acyclic graph to fulfill the document-level EE (DEE) effectively. Moreover, we reformalize a DEE task with the no-trigger-words design to ease the document-level event labeling. To demonstrate the effectiveness of Doc2EDAG, we build a large-scale real-world dataset consisting of Chinese financial announcements with the challenges mentioned above. Extensive experiments with comprehensive analyses illustrate the superiority of Doc2EDAG over state-of-the-art methods. Data and codes can be found at https://github.com/dolphin-zs/Doc2EDAG.
翻译:大多数现有的事件提取方法(EE)只是从句子范围内提取事件论证。然而,这类判决级EE方法试图处理金融、立法、卫生等新兴应用中大量文件,其中事件争论总是分散在不同句子之间,甚至许多此类事件都在同一文件中经常提到共同存在。为了应对这些挑战,我们提议了一个全新的端对端模式(Doc2EDAG),这个模式可以生成一个基于实体的定向循环图,以有效实现文件级EE(DEE)。此外,我们用无触发词设计来重新确定DEE的任务,以简化文件级事件标签。为了证明Doc2EDAG的有效性,我们建立了一个大型的实时数据组,由中国金融公告和上述挑战组成。通过全面分析的实验,可以说明Doc2EDAG优于国家艺术方法。数据和代码见https://github.com/dolphin-zs/Doc2EDAGG。