Reinforcement learning (RL) has shown great promise for developing dialogue management (DM) agents that are non-myopic, conduct rich conversations, and maximize overall user satisfaction. Despite recent developments in RL and language models (LMs), using RL to power conversational chatbots remains challenging, in part because RL requires online exploration to learn effectively, whereas collecting novel human-bot interactions can be expensive and unsafe. This issue is exacerbated by the combinatorial action spaces facing these algorithms, as most LM agents generate responses at the word level. We develop a variety of RL algorithms, specialized to dialogue planning, that leverage recent Mixture-of-Expert Language Models (MoE-LMs) -- models that capture diverse semantics, generate utterances reflecting different intents, and are amenable for multi-turn DM. By exploiting MoE-LM structure, our methods significantly reduce the size of the action space and improve the efficacy of RL-based DM. We evaluate our methods in open-domain dialogue to demonstrate their effectiveness w.r.t.\ the diversity of intent in generated utterances and overall DM performance.
翻译:强化学习(RL)对于发展非中年对话管理(DM)代理机构,进行丰富对话,并最大限度地提高用户的总体满意度,显示了巨大的希望。尽管最近在RL和语言模型(LMs)方面有所发展,利用RL为对话聊天室提供动力,但挑战性仍然存在,部分原因是RL需要在线探索才能有效地学习,而收集人类-Bot新型互动可能费用昂贵和不安全。由于大多数LM代理机构在字级上做出响应,这些算法所面临的组合行动空间使这一问题更加严重。我们开发了各种RL算法,专门用于对话规划,利用了最近的Mixture-Explert语言模型(MOE-LMs) -- -- 这些模型能够捕捉到多种语义学,产生反映不同意图的言语,并有利于多动MDM。通过开发ME-LM结构,我们的方法大大缩小了行动空间的规模,提高了以RLMM为基础的DM的功效。我们评估了在开放对话中的方法,以显示其有效性。