Event extraction (EE) plays an important role in many industrial application scenarios, and high-quality EE methods require a large amount of manual annotation data to train supervised learning models. However, the cost of obtaining annotation data is very high, especially for annotation of domain events, which requires the participation of experts from corresponding domain. So we introduce active learning (AL) technology to reduce the cost of event annotation. But the existing AL methods have two main problems, which make them not well used for event extraction. Firstly, the existing pool-based selection strategies have limitations in terms of computational cost and sample validity. Secondly, the existing evaluation of sample importance lacks the use of local sample information. In this paper, we present a novel deep AL method for EE. We propose a batch-based selection strategy and a Memory-Based Loss Prediction model (MBLP) to select unlabeled samples efficiently. During the selection process, we use an internal-external sample loss ranking method to evaluate the sample importance by using local information. Finally, we propose a delayed training strategy to train the MBLP model. Extensive experiments are performed on three domain datasets, and our method outperforms other state-of-the-art methods.
翻译:事件抽取在许多工业应用场景中扮演着重要角色,高质量的事件抽取方法需要大量手动标注数据来训练监督式学习模型。然而,获得标注数据的成本非常高,特别是对于领域事件的注释,需要相应领域的专家参与。因此,我们引入主动学习(AL)技术来降低事件注释的成本。但是,现有的AL方法存在两个主要问题,使它们不能很好地用于事件抽取。首先,现有的基于pool的选择策略在计算成本和样本有效性方面存在局限性。其次,现有的样本重要性评估缺乏对局部样本信息的利用。在本文中,我们提出了一种新颖的深度AL方法用于EE。我们提出了一种基于批次的选择策略和基于记忆损失预测模型(MBLP)来有效地选择未标记样本。在选择过程中,我们使用内部- 外部样本损失排名法通过使用本地信息来评估样本重要性。最后,我们提出了一种延迟训练策略来训练MBLP模型。我们在三个领域数据集上进行了大量实验,我们的方法优于其他最先进的方法。