Context modeling plays a critical role in building multi-turn dialogue systems. Conversational Query Rewriting (CQR) aims to simplify the multi-turn dialogue modeling into a single-turn problem by explicitly rewriting the conversational query into a self-contained utterance. However, existing approaches rely on massive supervised training data, which is labor-intensive to annotate. And the detection of the omitted important information from context can be further improved. Besides, intent consistency constraint between contextual query and rewritten query is also ignored. To tackle these issues, we first propose to construct a large-scale CQR dataset automatically via self-supervised learning, which does not need human annotation. Then we introduce a novel CQR model Teresa based on Transformer, which is enhanced by self-attentive keywords detection and intent consistency constraint. Finally, we conduct extensive experiments on two public datasets. Experimental results demonstrate that our proposed model outperforms existing CQR baselines significantly, and also prove the effectiveness of self-supervised learning on improving the CQR performance.
翻译:在建立多方向对话系统中,背景建模起着关键作用。 对话查询重写( CQR) 旨在简化多方向对话模式, 将谈话查询明确改写成自成一体的语句, 从而将多方向对话模式简化为单向问题。 但是, 现有方法依赖于大量监管的培训数据, 这对于批注来说是劳动密集型的。 从上下文中检测遗漏的重要信息可以进一步改进。 此外, 背景查询和重写查询之间的意图一致性限制也被忽略了。 为了解决这些问题, 我们首先提议通过自我监督的学习自动构建一个大型 CQR 数据集, 不需要人文注解。 然后我们推出一个新的基于变形器的 CQR Teresa 模型, 由自我强化关键词的检测和意图一致性限制加以强化。 最后, 我们对两个公共数据集进行了广泛的实验。 实验结果表明, 我们提议的模型大大地超越了 CQR 现有基线, 并证明在改进 CQR 性能方面进行自我监管学习的效果。