Many studies have proposed methods for optimizing the dialogue performance of an entire pipeline task-oriented dialogue system by jointly training modules in the system using reinforcement learning. However, these methods are limited in that they can only be applied to modules implemented using trainable neural-based methods. To solve this problem, we propose a method for optimizing a pipeline system composed of modules implemented with arbitrary methods for dialogue performance. With our method, neural-based components called post-processing networks (PPNs) are installed inside such a system to post-process the output of each module. All PPNs are updated to improve the overall dialogue performance of the system by using reinforcement learning, not necessitating each module to be differentiable. Through dialogue simulation and human evaluation on the MultiWOZ dataset, we show that our method can improve the dialogue performance of pipeline systems consisting of various modules.
翻译:许多研究提出了优化整个编审中任务导向对话系统对话性能的方法,方法是利用强化学习,在系统中联合培训模块,优化整个编审中任务导向对话系统的对话性能;然而,这些方法有限,因为只能应用于使用基于神经的训练方法实施的模块;为解决这一问题,我们提出了优化由专断对话性能方法实施的模块组成的编审中系统的方法;用我们的方法,在这样一个系统中安装了称为后处理网络的神经元件,用于处理每个模块的产出;更新了所有PPN,以便利用强化学习,改善系统的总体对话性能,而不必要求每个模块有差异;通过对多功能区数据集的对话模拟和人文评估,我们表明我们的方法可以改进由各种模块组成的编审中系统的对话性能。