In this paper, we study the task of selecting the optimal response given a user and system utterance history in retrieval-based multi-turn dialog systems. Recently, pre-trained language models (e.g., BERT, RoBERTa, and ELECTRA) showed significant improvements in various natural language processing tasks. This and similar response selection tasks can also be solved using such language models by formulating the tasks as dialog--response binary classification tasks. Although existing works using this approach successfully obtained state-of-the-art results, we observe that language models trained in this manner tend to make predictions based on the relatedness of history and candidates, ignoring the sequential nature of multi-turn dialog systems. This suggests that the response selection task alone is insufficient for learning temporal dependencies between utterances. To this end, we propose utterance manipulation strategies (UMS) to address this problem. Specifically, UMS consist of several strategies (i.e., insertion, deletion, and search), which aid the response selection model towards maintaining dialog coherence. Further, UMS are self-supervised methods that do not require additional annotation and thus can be easily incorporated into existing approaches. Extensive evaluation across multiple languages and models shows that UMS are highly effective in teaching dialog consistency, which leads to models pushing the state-of-the-art with significant margins on multiple public benchmark datasets.
翻译:在本文中,我们研究了根据基于检索的多方向对话系统中用户和系统扩展历史选择最佳反应的任务。最近,经过事先培训的语言模型(如BERT、ROBERTA和ELECTRA)显示各种自然语言处理任务的重大改进。这种和类似的反应选择任务也可以通过制定对话-反应二进制分类任务等任务来利用这些语言模型来解决。虽然使用这种方法的现有工作成功地取得了最新的结果,但我们注意到,以这种方式培训的语言模型往往根据历史和候选人的相关性作出预测,忽视多方向对话系统的顺序性质。这表明,单靠预先培训的语言选择任务不足以学习各种自然语言处理任务之间的时间依赖性。为此,我们提出了发声操纵战略来解决这一问题。具体地说,UMS由若干战略(即插入、删除和搜索)组成,这些战略有助于保持对话间隙的一致性。此外,UMS是自我监督的方法,不需要额外的说明,多方向对话系统的顺序,因此可以很容易地将多重对话模式纳入现有的重要基准。