Automatic dialog systems have become a mainstream part of online customer service. Many such systems are built, maintained, and improved by customer service specialists, rather than dialog systems engineers and computer programmers. As conversations between people and machines become commonplace, it is critical to understand what is working, what is not, and what actions can be taken to reduce the frequency of inappropriate system responses. These analyses and recommendations need to be presented in terms that directly reflect the user experience rather than the internal dialog processing. This paper introduces and explains the use of Actionable Conversational Quality Indicators (ACQIs), which are used both to recognize parts of dialogs that can be improved, and to recommend how to improve them. This combines benefits of previous approaches, some of which have focused on producing dialog quality scoring while others have sought to categorize the types of errors the dialog system is making. We demonstrate the effectiveness of using ACQIs on LivePerson internal dialog systems used in commercial customer service applications, and on the publicly available CMU LEGOv2 conversational dataset (Raux et al. 2005). We report on the annotation and analysis of conversational datasets showing which ACQIs are important to fix in various situations. The annotated datasets are then used to build a predictive model which uses a turn-based vector embedding of the message texts and achieves an 79% weighted average f1-measure at the task of finding the correct ACQI for a given conversation. We predict that if such a model worked perfectly, the range of potential improvement actions a bot-builder must consider at each turn could be reduced by an average of 81%.
翻译:自动对话系统已成为在线客户服务的主流部分。 许多这样的系统是由客户服务专家而不是对话系统工程师和计算机程序员建立、维护和改进的。 随着人与机器之间的对话变得司空见惯,了解什么是有效的、什么不是的以及可以采取哪些行动来减少系统反应的频率至关重要。 这些分析和建议需要以直接反映用户经验而不是内部对话处理的术语来提出。 本文介绍并解释了如何使用可操作的对话质量指标(ACQI),这些指标既用于识别可以改进的对话框部分,又用于建议如何改进这些系统。这结合了以往方法的好处,其中一些方法侧重于制作对话质量评分,而另一些则试图对对话系统正在发生的错误类型进行分类。我们展示了在商业客户服务应用程序中使用的LivePerson内部对话系统使用ACQIs的有效性,以及公开提供的 CMULEGOv2对话数据集(Roudal等人)。 2005年,我们报告了对可改进的谈话对话部分的注释和分析,报告在谈话数据集中,在对ACal-Calalalalalalalal Ralevalational adalation As a transal mas real be supal be supal a ma ma max max be ma ma ma ma ma ma ma ma ma max max max ma ma ma ma ma ma ma ma ma ma ma ma ma ma ma ma ma ma ma ma ma ma ma ma ma ma max ma ma ma max ma ma max ma ma ma ma ma max max max max max max max max max ma ma mas mas mas mas mas mas mas mas mas mas) mas mas mas mas mas mas mas mas mas mas mas mas mas mas mas mas mas mas mas mas mas ma