The widely studied task of Natural Language Inference (NLI) requires a system to recognize whether one piece of text is textually entailed by another, i.e. whether the entirety of its meaning can be inferred from the other. In current NLI datasets and models, textual entailment relations are typically defined on the sentence- or paragraph-level. However, even a simple sentence often contains multiple propositions, i.e. distinct units of meaning conveyed by the sentence. As these propositions can carry different truth values in the context of a given premise, we argue for the need to recognize the textual entailment relation of each proposition in a sentence individually. We propose PropSegmEnt, a corpus of over 35K propositions annotated by expert human raters. Our dataset structure resembles the tasks of (1) segmenting sentences within a document to the set of propositions, and (2) classifying the entailment relation of each proposition with respect to a different yet topically-aligned document, i.e. documents describing the same event or entity. We establish strong baselines for the segmentation and entailment tasks. Through case studies on summary hallucination detection and document-level NLI, we demonstrate that our conceptual framework is potentially useful for understanding and explaining the compositionality of NLI labels.
翻译:广受研究的自然语言推断任务(NLI)要求有一个系统来确认某一文本是否为另一文本的文字含义,即其全部含义是否可以从另一文本中推断出来。在目前的NLI数据集和模型中,文字必然关系通常是在句或段落一级界定的,但是,即使一个简单的句子也常常包含多种主张,即该句所传达的不同含义单位。由于这些主张在特定前提下可以包含不同的真实价值,因此我们主张需要个别地承认每项主张的文字含义关系。我们建议PropSegemEnt, 一套超过35K的提议,由专家人类计数员附加说明。我们的数据集结构类似于(1) 在一份文件中将句子与一套主张分解,以及(2) 将每项主张与不同但主题一致的文件,即描述同一事件或实体的文件的必然关系加以分类。我们为分解和必然要承担的任务确定了强有力的基准。我们通过案例研究,提出一套超过35K的提议,由专家人计数者附加说明。我们的数据结构类似于(1) 将句子分为一套提议,将每一项提议与一套建议相近似为国家测判的理论框架。