The impact of user satisfaction in policy learning task-oriented dialogue systems has long been a subject of research interest. Most current models for estimating the user satisfaction either (i) treat out-of-context short-texts, such as product reviews, or (ii) rely on turn features instead of on distributed semantic representations. In this work we adopt deep neural networks that use distributed semantic representation learning for estimating the user satisfaction in conversations. We evaluate the impact of modelling context length in these networks. Moreover, we show that the proposed hierarchical network outperforms state-of-the-art quality estimators. Furthermore, we show that applying these networks to infer the reward function in a Partial Observable Markov Decision Process (POMDP) yields to a great improvement in the task success rate.
翻译:长期以来,用户对政策学习、任务导向对话系统满意度的影响一直是引起研究兴趣的一个主题,目前用于估计用户满意度的大多数模型要么是(一) 处理产品审查等文本外的短文本,要么是(二) 依赖转基因特征,而不是分布式语义表示;在这项工作中,我们采用了利用分布式语义表示法学习来估计用户对对话满意度的深层神经网络;我们评估了这些网络建模背景长度的影响;此外,我们表明,拟议的等级网络优于最先进的质量估测器。此外,我们表明,在部分可观测的马尔科夫决定程序中,运用这些网络来推断奖励功能可以大大提高任务成功率。