Event extraction, the technology that aims to automatically get the structural information from documents, has attracted more and more attention in many fields. Most existing works discuss this issue with the token-level multi-label classification framework by distinguishing the tokens as different roles while ignoring the writing styles of documents. The writing style is a special way of content organizing for documents and it is relative fixed in documents with a special field (e.g. financial, medical documents, etc.). We argue that the writing style contains important clues for judging the roles for tokens and the ignorance of such patterns might lead to the performance degradation for the existing works. To this end, we model the writing style in documents as a distribution of argument roles, i.e., Role-Rank Distribution, and propose an event extraction model with the Role-Rank Distribution based Supervision Mechanism to capture this pattern through the supervised training process of an event extraction task. We compare our model with state-of-the-art methods on several real-world datasets. The empirical results show that our approach outperforms other alternatives with the captured patterns. This verifies the writing style contains valuable information that could improve the performance of the event extraction task.
翻译:事件提取技术是自动从文档中获取结构性信息的技术,它在许多领域吸引了越来越多的关注。大多数现有工作都与象征性的多标签分类框架讨论这一问题,将符号区分为不同的角色,而忽略文档的写法风格。书写风格是为文档组织内容的特殊方式,相对固定在带有特殊领域(如财务、医疗文件等)的文档中。我们争辩说,书写风格包含重要线索,用于判断标本的作用以及这种模式的无知,可能导致现有作品的性能退化。为此,我们在文档中将写法风格建模,作为参数角色的分布,即角色-兰克发行,并提议一个基于角色-兰克分配监管机制的事件提取模式,以便通过事件提取任务的监督培训过程来捕捉这种模式。我们比较了我们的模型和几个真实世界数据集中的最新方法。经验结果显示,我们的方法比得上其他模式。这可以验证写法风格包含有价值的信息,可以改进事件提取任务的业绩。