While numerous architectures for long-range language models (LRLMs) have recently been proposed, a meaningful evaluation of their discourse-level language understanding capabilities has not yet followed. To this end, we introduce ChapterBreak, a challenge dataset that provides an LRLM with a long segment from a narrative that ends at a chapter boundary and asks it to distinguish the beginning of the ground-truth next chapter from a set of negative segments sampled from the same narrative. A fine-grained human annotation reveals that our dataset contains many complex types of chapter transitions (e.g., parallel narratives, cliffhanger endings) that require processing global context to comprehend. Experiments on ChapterBreak show that existing LRLMs fail to effectively leverage long-range context, substantially underperforming a segment-level model trained directly for this task. We publicly release our ChapterBreak dataset to spur more principled future research into LRLMs.
翻译:虽然最近提出了许多远程语言模型(LRLM)架构,但尚未对其语言语言理解能力进行有意义的评价。为此,我们引入了CapielBreak,这是一个挑战数据集,它提供长段LRM与在章节边界结束的叙述的长段,并要求它区分地平线下一章的开头与从同一叙述中抽样的一组负面部分。细微的人类说明显示,我们的数据集包含许多复杂的章节过渡类型(如平行叙述、悬崖挂起),需要处理全球背景来理解。关于CapiBreak的实验表明,现有的LRLMS未能有效地利用长段环境,严重低于直接为这项任务培训的分段模式。我们公开发布我们的Capireak数据集,以促进今后对LLMs进行更有原则的研究。