Data is published on the web over time in great volumes, but majority of the data is unstructured, making it hard to understand and difficult to interpret. Information Extraction (IE) methods extract structured information from unstructured data. One of the challenging IE tasks is Event Extraction (EE) which seeks to derive information about specific incidents and their actors from the text. EE is useful in many domains such as building a knowledge base, information retrieval, summarization and online monitoring systems. In the past decades, some event ontologies like ACE, CAMEO and ICEWS were developed to define event forms, actors and dimensions of events observed in the text. These event ontologies still have some shortcomings such as covering only a few topics like political events, having inflexible structure in defining argument roles, lack of analytical dimensions, and complexity in choosing event sub-types. To address these concerns, we propose an event ontology, namely COfEE, that incorporates both expert domain knowledge, previous ontologies and a data-driven approach for identifying events from text. COfEE consists of two hierarchy levels (event types and event sub-types) that include new categories relating to environmental issues, cyberspace, criminal activity and natural disasters which need to be monitored instantly. Also, dynamic roles according to each event sub-type are defined to capture various dimensions of events. In a follow-up experiment, the proposed ontology is evaluated on Wikipedia events, and it is shown to be general and comprehensive. Moreover, in order to facilitate the preparation of gold-standard data for event extraction, a language-independent online tool is presented based on COfEE.
翻译:长期在网上公布大量数据,但大多数数据没有结构化,难以理解和解释。信息提取(IE)方法从非结构化数据中提取结构化信息。一个具有挑战性的IE任务是“Expleton”(EE),它试图从文本中获取具体事件及其行为者的信息。EE在许多领域非常有用,如建立知识库、信息检索、汇总和在线监测系统。在过去几十年中,开发了一些诸如ACE、CAMEO和ICEWS之类的事件,以界定在文本中观察到的事件的形式、行为者和层面。这些事件上仍然有一些缺陷,例如仅涵盖几个主题,如政治活动,在界定争论作用方面结构不灵活,缺乏分析层面,在选择事件子类型时复杂。为了解决这些问题,我们建议举办一个包含专家域知识、先前的CAMEO和ICEWS等内容的活动,在确定文本中所观察到的事件时采用的数据驱动方法。COFEEE包括两个层次层次的编制,在与动态事件相关的每一种类型和亚型活动上显示的排序,在与动态活动上显示的每个类型和亚型活动上显示的排序,在排序上显示的关于动态活动的排序上,在各种类型和亚型事件上显示的顺序上,在各种活动上显示的计算。