Event argument extraction (EAE) aims to extract arguments with given roles from texts, which have been widely studied in natural language processing. Most previous works have achieved good performance in specific EAE datasets with dedicated neural architectures. Whereas, these architectures are usually difficult to adapt to new datasets/scenarios with various annotation schemas or formats. Furthermore, they rely on large-scale labeled data for training, which is unavailable due to the high labelling cost in most cases. In this paper, we propose a multi-format transfer learning model with variational information bottleneck, which makes use of the information especially the common knowledge in existing datasets for EAE in new datasets. Specifically, we introduce a shared-specific prompt framework to learn both format-shared and format-specific knowledge from datasets with different formats. In order to further absorb the common knowledge for EAE and eliminate the irrelevant noise, we integrate variational information bottleneck into our architecture to refine the shared representation. We conduct extensive experiments on three benchmark datasets, and obtain new state-of-the-art performance on EAE.
翻译:事件参数提取 (EAE) 旨在从自然语言处理中广泛研究的文本中提取具有特定作用的论点,这些文本在自然语言处理中已经得到了广泛研究。以前的大多数作品在特定EEA数据集中都取得了良好的表现。这些结构通常难以适应新的数据集/情景,使用不同的批注公式或格式。此外,它们依靠大规模标签数据进行培训,而由于大多数情况下标签成本高,这种数据无法使用。在本文中,我们提出了一个多格式传输学习模型,带有变异信息瓶颈,在新的数据集中特别利用现有的EAE数据集中的共同知识。具体地说,我们引入了一个共同的快速框架,从不同格式的数据集中学习格式共享和格式特定知识。为了进一步吸收EEAE的共同知识,消除无关的噪音,我们将变异信息瓶颈纳入我们的架构,以完善共享的表达方式。我们在三个基准数据集上进行了广泛的实验,并获得了EAE的新的状态性能。