Event extraction (EE) plays an important role in many industrial application scenarios, and high-quality EE methods require a large amount of manual annotation data to train supervised learning models. However, the cost of obtaining annotation data is very high, especially for annotation of domain events, which requires the participation of experts from corresponding domain. So we introduce active learning (AL) technology to reduce the cost of event annotation. But the existing AL methods have two main problems, which make them not well used for event extraction. Firstly, the existing pool-based selection strategies have limitations in terms of computational cost and sample validity. Secondly, the existing evaluation of sample importance lacks the use of local sample information. In this paper, we present a novel deep AL method for EE. We propose a batch-based selection strategy and a Memory-Based Loss Prediction model (MBLP) to select unlabeled samples efficiently. During the selection process, we use an internal-external sample loss ranking method to evaluate the sample importance by using local information. Finally, we propose a delayed training strategy to train the MBLP model. Extensive experiments are performed on three domain datasets, and our method outperforms other state-of-the-art methods.
翻译:事件提取(EE)在许多工业应用情景中起着重要作用,高质量的 EE 方法要求有大量人工注释数据,用于培训受监督的学习模式。然而,获取注释数据的成本非常高,特别是用于说明域事件的成本非常高,这需要相应领域的专家参与。因此,我们引入积极的学习(AL)技术,以减少事件注释的成本。但现有的AL 方法有两个主要问题,使它们不能很好地用于事件提取。首先,现有基于集合的筛选战略在计算成本和样本有效性方面有局限性。第二,现有的样本重要性评估缺乏当地样本信息的利用。在本文件中,我们为EE提出了一个新的深层次的 AL 方法。我们提出了一个基于批量的选择战略和基于记忆的损失预测模型(MBLP), 以高效地选择无标签的样本。在选择过程中,我们使用内部-外部抽样损失排序方法来评估样本的重要性,使用当地信息。最后,我们提出了一个延迟的培训战略来培训MBLP模型。在三个域数据设置上进行了广泛的实验。</s>