Event extraction (EE) is crucial to downstream tasks such as new aggregation and event knowledge graph construction. Most existing EE datasets manually define fixed event types and design specific schema for each of them, failing to cover diverse events emerging from the online text. Moreover, news titles, an important source of event mentions, have not gained enough attention in current EE research. In this paper, We present Title2Event, a large-scale sentence-level dataset benchmarking Open Event Extraction without restricting event types. Title2Event contains more than 42,000 news titles in 34 topics collected from Chinese web pages. To the best of our knowledge, it is currently the largest manually-annotated Chinese dataset for open event extraction. We further conduct experiments on Title2Event with different models and show that the characteristics of titles make it challenging for event extraction, addressing the significance of advanced study on this problem. The dataset and baseline codes are available at https://open-event-hub.github.io/title2event.
翻译:事件提取( EE) 对新的集成和事件知识图构建等下游任务至关重要。 大部分现有的 EE 数据集手工定义固定事件类型并设计每个事件的具体图案, 无法覆盖在线文本中出现的各种事件。 此外, 在当前 EE 研究中, 重要事件来源 -- -- 新闻标题( 重要事件来源) 没有得到足够重视。 在本文中, 我们介绍了第 2 版, 一个大型的句级数据集, 不受事件类型限制的开放事件提取基准 。 标题2 Event 包含从中国网页收集的34个专题中的42 000多条新闻标题。 据我们所知, 目前它是用于公开事件提取的最大手工附加注释的中国数据集 。 我们用不同的模型在标题2 Event 上进一步进行实验, 并显示标题的特性对事件提取具有挑战性, 涉及这一问题的高级研究的意义。 数据集和基线代码可在 https:// open- event-hub.github.io/ title2event上查阅 。