Recent work (Takanobu et al., 2020) proposed the system-wise evaluation on dialog systems and found that improvement on individual components (e.g., NLU, policy) in prior work may not necessarily bring benefit to pipeline systems in system-wise evaluation. To improve the system-wise performance, in this paper, we propose new joint system-wise optimization techniques for the pipeline dialog system. First, we propose a new data augmentation approach which automates the labeling process for NLU training. Second, we propose a novel stochastic policy parameterization with Poisson distribution that enables better exploration and offers a principled way to compute policy gradient. Third, we propose a reward bonus to help policy explore successful dialogs. Our approaches outperform the competitive pipeline systems from Takanobu et al. (2020) by big margins of 12% success rate in automatic system-wise evaluation and of 16% success rate in human evaluation on the standard multi-domain benchmark dataset MultiWOZ 2.1, and also outperform the recent state-of-the-art end-to-end trained model from DSTC9.
翻译:最近的工作(Takanobu等人,2020年)提议对对话系统进行系统评估,发现先前工作中对个别组成部分(例如NLU,政策)的改进不一定在系统评估中给管道系统带来好处。为了改进系统评估中的系统性能,我们在本文件中为管道对话系统提出了新的系统性联合优化技术。首先,我们提出一个新的数据增强方法,将NLU培训的标签进程自动化。第二,我们提议采用新的随机化政策参数化,配以Poisson分发,以便能够进行更好的探索,并提供计算政策梯度的原则方法。第三,我们提议奖励,以帮助政策探索成功的对话。我们的方法在自动系统性评价中超过了Takanobu等人的竞争性管道系统性系统(2020年)的12%的成功率,在标准多域基准数据集MUWOZ 2.1的人类评价中超过16%的成功率,并且也超越了最近从DSTC9所培训的状态端至端模型。