Despite recent advancements in language models (LMs), their application to dialogue management (DM) problems and ability to carry on rich conversations remain a challenge. We use reinforcement learning (RL) to develop a dialogue agent that avoids being short-sighted (outputting generic utterances) and maximizes overall user satisfaction. Most existing RL approaches to DM train the agent at the word-level, and thus, have to deal with a combinatorially complex action space even for a medium-size vocabulary. As a result, they struggle to produce a successful and engaging dialogue even if they are warm-started with a pre-trained LM. To address this issue, we develop a RL-based DM using a novel mixture of expert language model (MoE-LM) that consists of (i) a LM capable of learning diverse semantics for conversation histories, (ii) a number of {\em specialized} LMs (or experts) capable of generating utterances corresponding to a particular attribute or personality, and (iii) a RL-based DM that performs dialogue planning with the utterances generated by the experts. Our MoE approach provides greater flexibility to generate sensible utterances with different intents and allows RL to focus on conversational-level DM. We compare it with SOTA baselines on open-domain dialogues and demonstrate its effectiveness both in terms of the diversity and sensibility of the generated utterances and the overall DM performance.
翻译:尽管语言模式(LMS)最近有所进步,但在应用语言模式(LMS)方面,将语言模式应用于对话管理(DM)问题和进行丰富对话的能力仍然是一项挑战。我们利用强化学习(RL)开发一个对话代理机构,避免短视(输出通用语句)和最大限度地提高用户的总体满意度。大多数现有的RL方法对DM在字级上培训该代理机构,因此,必须处理组合式复杂的行动空间,即使是用于中等规模的词汇。因此,它们很难产生成功和参与性的对话,即使它们是一个经过预先培训的LM(M)的热点启动。为了解决这一问题,我们利用专家语言模式(MOE-LM)的新组合来开发一个基于RL的DM(RL)对话代理机构,其中包括(一)一个能够学习不同语言历史的语义的LMM方法,(二)一些能够产生与特定属性或个性相称的语调,以及(三)一个基于RLDMDM的语调与专家所生成的言语句进行对话规划。我们的ME-DMRE方法使其具有更大灵活性,从而显示其总体对话在水平上具有更大程度上的灵活性。