Recent open-domain dialogue models have brought numerous breakthroughs. However, building a chat system is not scalable since it often requires a considerable volume of human-human dialogue data, especially when enforcing features such as persona, style, or safety. In this work, we study the challenge of imposing roles on open-domain dialogue systems, with the goal of making the systems maintain consistent roles while conversing naturally with humans. To accomplish this, the system must satisfy a role specification that includes certain conditions on the stated features as well as a system policy on whether or not certain types of utterances are allowed. For this, we propose an efficient data collection framework leveraging in-context few-shot learning of large-scale language models for building role-satisfying dialogue dataset from scratch. We then compare various architectures for open-domain dialogue systems in terms of meeting role specifications while maintaining conversational abilities. Automatic and human evaluations show that our models return few out-of-bounds utterances, keeping competitive performance on general metrics. We release a Korean dialogue dataset we built for further research.
翻译:最近的开放域对话模式带来了许多突破。然而,建立聊天系统并不易扩展,因为它往往需要大量的人文对话数据,特别是在强制执行人文、风格或安全等特征时。在这项工作中,我们研究将角色强加于开放域对话系统的挑战,目的是使系统在与人自然地交谈的同时保持一贯的作用。要做到这一点,该系统必须满足角色规范,其中包括关于所述特征的某些条件以及关于是否允许某些类型言论的系统政策。为此,我们提出一个高效的数据收集框架,利用在文本中少见的大规模语言模型的学习,从零开始建立角色满足对话数据集。然后,我们比较开放域对话系统的各种架构,以会议角色规范,同时保持对话能力。自动和人文评估显示,我们的模式返回了很少外部的言论,在一般指标上保持有竞争力的表现。我们为进一步研究而建立的韩国对话数据集。