Proactive dialogue system is able to lead the conversation to a goal topic and has advantaged potential in bargain, persuasion and negotiation. Current corpus-based learning manner limits its practical application in real-world scenarios. To this end, we contribute to advance the study of the proactive dialogue policy to a more natural and challenging setting, i.e., interacting dynamically with users. Further, we call attention to the non-cooperative user behavior -- the user talks about off-path topics when he/she is not satisfied with the previous topics introduced by the agent. We argue that the targets of reaching the goal topic quickly and maintaining a high user satisfaction are not always converge, because the topics close to the goal and the topics user preferred may not be the same. Towards this issue, we propose a new solution named I-Pro that can learn Proactive policy in the Interactive setting. Specifically, we learn the trade-off via a learned goal weight, which consists of four factors (dialogue turn, goal completion difficulty, user satisfaction estimation, and cooperative degree). The experimental results demonstrate I-Pro significantly outperforms baselines in terms of effectiveness and interpretability.
翻译:积极主动的对话系统能够引导对话进入一个目标主题,在讨价还价、说服和谈判方面具有优势。当前基于基本知识的学习方式限制了其在现实世界情景中的实际应用。为此,我们推动将主动对话政策的研究推向更自然和更具挑战性的环境,即与用户进行动态互动。此外,我们呼吁注意不合作的用户行为 -- -- 当用户对代理人先前提出的议题不满意时,他/她谈论离路主题。我们争辩说,迅速达到目标主题和保持用户高度满意程度的目标并不总是趋同,因为接近目标的专题和用户喜欢的专题可能不同。为解决这一问题,我们提出了一个名为I-Pro的新解决方案,可以在互动环境中学习前瞻性政策。具体地说,我们通过一个学习的目标权重来学习贸易,这包括四个因素(对话转、目标完成困难、用户满意度估计与合作程度)。实验结果显示,I-Pro在有效性和可解释性方面明显超出基线。