提高用户对对话代理的满意度预测效率的自我监督学习 (Self-Supervised Contrastive Learning for Efficient User Satisfaction Prediction in Conversational Agents)

Turn-level user satisfaction is one of the most important performance metrics for conversational agents. It can be used to monitor the agent's performance and provide insights about defective user experiences. Moreover, a powerful satisfaction model can be used as an objective function that a conversational agent continuously optimizes for. While end-to-end deep learning has shown promising results, having access to a large number of reliable annotated samples required by these methods remains challenging. In a large-scale conversational system, there is a growing number of newly developed skills, making the traditional data collection, annotation, and modeling process impractical due to the required annotation costs as well as the turnaround times. In this paper, we suggest a self-supervised contrastive learning approach that leverages the pool of unlabeled data to learn user-agent interactions. We show that the pre-trained models using the self-supervised objective are transferable to the user satisfaction prediction. In addition, we propose a novel few-shot transfer learning approach that ensures better transferability for very small sample sizes. The suggested few-shot method does not require any inner loop optimization process and is scalable to very large datasets and complex models. Based on our experiments using real-world data from a large-scale commercial system, the suggested approach is able to significantly reduce the required number of annotations, while improving the generalization on unseen out-of-domain skills.

翻译：转换用户满意度是对话代理器的最重要的业绩衡量标准之一。它可用于监测代理器的性能并提供缺陷用户经验的洞察力。此外, 一个强大的满意度模型可以用作一个客观的功能, 一个对话代理器可以不断优化。虽然端到端深层次的学习已经展示出有希望的结果, 获得这些方法所需的大量可靠的附加说明的样本仍然具有挑战性。在大规模对话系统中, 有越来越多的新开发的技能, 使得传统数据收集、注释和建模过程变得不切实际, 因为需要说明成本以及周转时间, 建议采用一种自我监督的对比学习方法, 利用无标签数据库学习用户- 代理器的互动。我们显示, 使用自我监督目标的预先培训模型可以让用户满意度预测具有挑战性。此外, 我们提议了一种新颖的微小的传输学习方法, 以确保非常小的抽样尺寸的可转让性。所建议的微小的模拟方法不需要任何内部循环优化过程以及转变时间。在本文中, 我们建议的一种自我监督的对比式学习方法, 能够大大缩小我们所推荐的大规模地改进的模型, 。