Understanding how events are semantically related to each other is the essence of reading comprehension. Recent event-centric reading comprehension datasets focus mostly on event arguments or temporal relations. While these tasks partially evaluate machines' ability of narrative understanding, human-like reading comprehension requires the capability to process event-based information beyond arguments and temporal reasoning. For example, to understand causality between events, we need to infer motivation or purpose; to establish event hierarchy, we need to understand the composition of events. To facilitate these tasks, we introduce ESTER, a comprehensive machine reading comprehension (MRC) dataset for Event Semantic Relation Reasoning. The dataset leverages natural language queries to reason about the five most common event semantic relations, provides more than 6K questions and captures 10.1K event relation pairs. Experimental results show that the current SOTA systems achieve 22.1%, 63.3%, and 83.5% for token-based exact-match, F1, and event-based HIT@1 scores, which are all significantly below human performances (36.0%, 79.6%, 100% respectively), highlighting our dataset as a challenging benchmark.
翻译:理解事件之间的内在联系是理解理解的精髓。 最近的以事件为中心的阅读理解数据集主要侧重于事件参数或时间关系。 虽然这些任务部分地评估机器的叙事理解能力, 但人类相似的阅读理解要求有处理事件信息的能力, 超越参数和时间推理。 例如, 要理解事件之间的因果关系, 我们需要推断动机或目的; 建立事件等级, 我们需要理解事件的构成。 为了便利这些任务, 我们引入 ESTER, 是一个全面的机器阅读理解( MRC) 数据集, 用于事件叙事理由。 数据集利用自然语言查询来解释五个最常见的事件叙文关系, 提供了超过 6K 个问题, 并捕捉到10.1K 事件对应关系。 实验结果表明, 目前的 SOTA 系统实现了22.1%、 63.3% 和 83.5%的象征性精确匹配、 F1 和以事件为基础的 HIT@1 评分, 都大大低于人类性能( 36.0%, 79.6%, 100 % ), 突出我们的数据集是一个具有挑战性的基准。