Retrieval-based conversational systems learn to rank response candidates for a given dialogue context by computing the similarity between their vector representations. However, training on a single textual form of the multi-turn context limits the ability of a model to learn representations that generalize to natural perturbations seen during inference. In this paper we propose a framework that incorporates augmented versions of a dialogue context into the learning objective. We utilize contrastive learning as an auxiliary objective to learn robust dialogue context representations that are invariant to perturbations injected through the augmentation method. We experiment with four benchmark dialogue datasets and demonstrate that our framework combines well with existing augmentation methods and can significantly improve over baseline BERT-based ranking architectures. Furthermore, we propose a novel data augmentation method, ConMix, that adds token level perturbations through stochastic mixing of tokens from other contexts in the batch. We show that our proposed augmentation method outperforms previous data augmentation approaches, and provides dialogue representations that are more robust to common perturbations seen during inference.
翻译:以检索为基础的对话系统通过计算其矢量表达方式之间的相似性,学会对特定对话背景的应答候选人进行排序。然而,关于多点背景单一文本形式的培训限制了模型学习概括在推论期间所见自然扰动的表述模型的能力。在本文件中,我们提议了一个框架,将对话背景的扩大版本纳入学习目标。我们利用对比学习作为辅助目标,学习强健的对话背景表述,这些表达方式不易通过扩增方法注入扰动。我们试验了四个基准对话数据集,并表明我们的框架与现有的增强方法相结合,并且能够大大改进基于BERT的基线排名结构。此外,我们提议了一种新的数据增强方法,即ConMix,通过将批量中其他场合的符号混杂在一起,增加象征性水平的扰动。我们表明,我们提议的增强方法超越了先前的数据增强方法,并提供了更强有力的对话表达方式,使之比推断过程中看到的常见扰动。