Stories and narratives are composed based on a variety of events. Understanding how these events are semantically related to each other is the essence of reading comprehension. Recent event-centric reading comprehension datasets focus on either event arguments or event temporal commonsense. Although these tasks evaluate machines' ability of narrative understanding, human like reading comprehension requires the capability to process event-based semantics beyond arguments and temporal commonsense. For example, to understand causality between events, we need to infer motivations or purposes; to understand event hierarchy, we need to parse the composition of events. To facilitate these tasks, we introduce ESTER, a comprehensive machine reading comprehension (MRC) dataset for Event Semantic Relation Reasoning. We study five most commonly used event semantic relations and formulate them as question answering tasks. Experimental results show that the current SOTA systems achieve 60.5%, 57.8%, and 76.3% for event-based F1, token based F1 and HIT@1 scores respectively, which are significantly below human performances.
翻译:故事和叙事由各种事件组成。 了解这些事件如何相互之间有内在联系是阅读理解的精髓。 最近的以事件为中心的阅读理解数据集侧重于事件争论或事件时间常识。 虽然这些任务评估机器的叙事理解能力, 但人类像阅读理解要求处理事件语义的能力, 超越参数和时间常识。 例如, 要理解事件之间的因果关系, 我们需要推断动机或目的; 理解事件等级, 我们需要分析事件的构成。 为了便利这些任务, 我们引入 ESTER, 事件语义关系原因的综合机器阅读理解数据集( MRC) 。 我们研究了五种最常用的事件语义关系, 并把它们描述成回答问题的任务 。 实验结果显示, 当前 SOTA 系统在事件F1、 象征性 F1 和 HIT@1 中分别达到60.5%、 57.8% 和 76.3%的分数, 大大低于人类性能 。