We propose the task of narrative incoherence detection as a new arena for inter-sentential semantic understanding: Given a multi-sentence narrative, decide whether there exist any semantic discrepancies in the narrative flow. Specifically, we focus on the missing sentence and discordant sentence detection. Despite its simple setup, this task is challenging as the model needs to understand and analyze a multi-sentence narrative, and predict incoherence at the sentence level. As an initial step towards this task, we implement several baselines either directly analyzing the raw text (\textit{token-level}) or analyzing learned sentence representations (\textit{sentence-level}). We observe that while token-level modeling has better performance when the input contains fewer sentences, sentence-level modeling performs better on longer narratives and possesses an advantage in efficiency and flexibility. Pre-training on large-scale data and auxiliary sentence prediction training objective further boost the detection performance of the sentence-level model.
翻译:我们提议将叙述不一致的检测任务作为理论间语义理解的新领域:在多语义描述中,决定说明流中是否存在任何语义差异。具体地说,我们侧重于缺失的句子和不一致的句子检测。尽管其设置简单,但这一任务具有挑战性,因为模型需要理解和分析多语种描述,并预测在句子层面的不一致。作为完成这项任务的第一步,我们实施了若干基线,或者直接分析原始文本(\ textit{token-level ),或者分析有学识的句子表达(\ textit{sentence-level } ) 。我们注意到,当投入包含较少的句子,而句级建模在较长的句子上表现更好,且在效率和灵活性方面具有优势。关于大规模数据和辅助句子预测培训的预备培训目标进一步提升了判决模式的检测性能。