Coherent discourse is distinguished from a mere collection of utterances by the satisfaction of a diverse set of constraints, for example choice of expression, logical relation between denoted events, and implicit compatibility with world-knowledge. Do neural language models encode such constraints? We design an extendable set of test suites addressing different aspects of discourse and dialogue coherence. Unlike most previous coherence evaluation studies, we address specific linguistic devices beyond sentence order perturbations, allowing for a more fine-grained analysis of what constitutes coherence and what neural models trained on a language modelling objective do encode. Extending the targeted evaluation paradigm for neural language models (Marvin and Linzen, 2018) to phenomena beyond syntax, we show that this paradigm is equally suited to evaluate linguistic qualities that contribute to the notion of coherence.
翻译:通过满足一系列不同的制约因素,例如选择表达方式、记名事件之间的逻辑关系以及隐含的与世界知识的兼容性,将共聚的言论与仅仅收集的言论区分开来。 神经语言模型对这种制约进行编码吗? 我们设计了一套可扩展的测试套件,处理讨论和对话一致性的不同方面。 与以往的大多数一致性评估研究不同,我们处理的是超出判决顺序干扰的特定语言装置,从而能够更精细地分析何为一致性以及受过语言建模目标培训的神经模型如何编码。 将神经语言模型(Marvin和Linzen,2018年)的定向评价范式扩大到超语法之外的现象,我们表明这一范式同样适合评估有助于一致性概念的语言素质。