Answering questions about why characters perform certain actions is central to understanding and reasoning about narratives. Despite recent progress in QA, it is not clear if existing models have the ability to answer "why" questions that may require commonsense knowledge external to the input narrative. In this work, we introduce TellMeWhy, a new crowd-sourced dataset that consists of more than 30k questions and free-form answers concerning why characters in short narratives perform the actions described. For a third of this dataset, the answers are not present within the narrative. Given the limitations of automated evaluation for this task, we also present a systematized human evaluation interface for this dataset. Our evaluation of state-of-the-art models show that they are far below human performance on answering such questions. They are especially worse on questions whose answers are external to the narrative, thus providing a challenge for future QA and narrative understanding research.
翻译:回答为什么字符执行某些行动的问题,对于理解和推理叙事至关重要。尽管质量评估最近有所进展,但尚不清楚现有模型是否有能力回答“为什么”问题,这些问题可能需要投入叙事的外部常识知识。在这项工作中,我们介绍了“TellMehary”,这是一个由30公里以上的问题组成的新的众源数据集,对短叙事中字符为何执行所述行动的问题提供了自由解答。对于三分之一的这一数据集,答案不在叙述中。鉴于对这项任务的自动评估的局限性,我们还为这一数据集提出了一个系统化的人类评价界面。我们对最新模型的评估表明,这些模型远远低于人类在回答此类问题时的表现。对于那些与叙述无关的问题来说,它们尤其糟糕,因此对未来质量保证和叙述理解研究提出了挑战。