Event argument extraction (EAE) has been well studied at the sentence level but under-explored at the document level. In this paper, we study to capture event arguments that actually spread across sentences in documents. Prior works mainly assume full access to rich document supervision, ignoring the fact that the argument supervision is limited in documents. To fill this gap, we present FewDocAE, a Few-Shot Document-Level Event Argument Extraction benchmark, based on the largest document-level event extraction dataset DocEE. We first define the new problem and reconstruct the corpus by a novel N-Way-D-Doc sampling instead of the traditional N-Way-K-Shot strategy. Then we adjust the advanced document-level neural models into the few-shot setting to provide baseline results under in- and cross-domain settings. Since the argument extraction depends on the context from multiple sentences and the learning process is limited to very few examples, we find the task to be very challenging with substantively low performance. Considering FewDocAE is closely related to practical use under low-resource regimes, we hope this benchmark encourages more research in this direction. Our data and codes will be available online.
翻译:在判决一级对事件提取参数(EAE)进行了很好的研究,但在文件一级探索不足。在本文件中,我们研究的是捕捉实际散布在文档中各句子的事件参数。 先前的工作主要假设充分接触丰富的文件监督, 忽略了对参数监督的限制这一事实。 为了填补这一空白, 我们介绍了根据最大的文件级事件提取数据集( DocE), 少许点文件级文件级事件提取参数( EAE) 基准。 我们首先定义了新问题, 并用新颖的N- Way- D- Doc 取样法而不是传统的N- Way- K- Shot 战略来重新构建文件库。 然后我们将高级文件级神经模型调整到少数点设置中, 在多处设置下提供基线结果。 由于参数提取取决于多个句子的背景, 学习过程仅限于极少数例子, 我们发现任务非常艰巨, 表现非常低。 考虑到 FewDoCAE 与低资源制度下的实际使用密切相关, 我们希望这一基准能鼓励更多在线研究。