Tangled multi-party dialogue contexts lead to challenges for dialogue reading comprehension, where multiple dialogue threads flow simultaneously within a common dialogue record, increasing difficulties in understanding the dialogue history for both human and machine. Previous studies mainly focus on utterance encoding methods with carefully designed features but pay inadequate attention to characteristic features of the structure of dialogues. We specially take structure factors into account and design a novel model for dialogue disentangling. Based on the fact that dialogues are constructed on successive participation and interactions between speakers, we model structural information of dialogues in two aspects: 1)speaker property that indicates whom a message is from, and 2) reference dependency that shows whom a message may refer to. The proposed method achieves new state-of-the-art on the Ubuntu IRC benchmark dataset and contributes to dialogue-related comprehension.
翻译:多党对话问题导致对对话理解的挑战,因为多重对话线在共同对话记录中同时流动,在理解人类和机器对话历史方面日益困难。以前的研究主要侧重于措辞编码方法,这些方法具有精心设计的特点,但没有充分注意对话结构的特征。我们特别考虑到结构因素,设计了对话脱钩的新模式。基于对话是在发言者之间连续参与和互动的基础上建立的,我们模拟对话的结构信息有两个方面:(1) 声音属性,显示信息来自谁;(2) 参考依赖性,显示信息可能指谁。拟议方法在Ubuntu IRC基准数据集上实现了新的最新水平,有助于与对话有关的理解。