Despite recent improvements in open-domain dialogue models, state of the art models are trained and evaluated on short conversations with little context. In contrast, the long-term conversation setting has hardly been studied. In this work we collect and release a human-human dataset consisting of multiple chat sessions whereby the speaking partners learn about each other's interests and discuss the things they have learnt from past sessions. We show how existing models trained on existing datasets perform poorly in this long-term conversation setting in both automatic and human evaluations, and we study long-context models that can perform much better. In particular, we find retrieval-augmented methods and methods with an ability to summarize and recall previous conversations outperform the standard encoder-decoder architectures currently considered state of the art.
翻译:尽管开放域对话模式最近有所改进,但最新艺术模型在短短对话中经过培训和评估,且背景很少。相比之下,长期对话环境几乎没有研究过。在这项工作中,我们收集并发布了一个由多个聊天会组成的人与人数据集,让讲台伙伴了解彼此的利益,并讨论他们从以往会议中学到的东西。我们展示了在这种长期对话环境中,在自动和人类评估中,现有关于现有数据集的培训模式如何表现不佳,我们研究了能够表现更好的长文本模型。特别是,我们找到了能够总结和回顾以往对话的精细方法和方法,从而超越了目前视为艺术状态的标准编码-破坏器结构。