Most previous studies aim at extracting events from a single sentence, while document-level event extraction still remains under-explored. In this paper, we focus on extracting event arguments from an entire document, which mainly faces two critical problems: a) the long-distance dependency between trigger and arguments over sentences; b) the distracting context towards an event in the document. To address these issues, we propose a Two-Stream Abstract meaning Representation enhanced extraction model (TSAR). TSAR encodes the document from different perspectives by a two-stream encoding module, to utilize local and global information and lower the impact of distracting context. Besides, TSAR introduces an AMR-guided interaction module to capture both intra-sentential and inter-sentential features, based on the locally and globally constructed AMR semantic graphs. An auxiliary boundary loss is introduced to enhance the boundary information for text spans explicitly. Extensive experiments illustrate that TSAR outperforms previous state-of-the-art by a large margin, with 2.54 F1 and 5.13 F1 performance gain on the public RAMS and WikiEvents datasets respectively, showing the superiority in the cross-sentence arguments extraction. We release our code in https://github.com/ PKUnlp-icler/TSAR.
翻译:过去的研究大多旨在从一个单句中提取事件,而文件级事件提取工作仍未得到充分探讨。在本文件中,我们侧重于从整个文件中提取事件论证,主要面临两个关键问题:(a) 触发和判决辩论之间的长距离依赖性;(b) 对文件中的事件的转移背景。为了解决这些问题,我们建议采用双层摘要意味着代表强化提取模式(TRAR)。TSAR用双流编码模块从不同角度将文件编码成一个不同的版本,以便利用当地和全球信息并降低分散内容的影响。此外,TRAR还引入了AMR-指导互动模块,以根据当地和全球构建的AMR 语系图来捕捉当前和当前特征的特征;(b) 引入了辅助边界损失,以明确加强文字覆盖的边界信息。 广泛的实验表明,TRASA超越了此前的状态,大大缩小了双流编码,用2.54 F1和5.13 F1在公开的RAMS和WikEvent的版本中获取了业绩。