Evaluation of intelligent assistants in large-scale and online settings remains an open challenge. User behavior-based online evaluation metrics have demonstrated great effectiveness for monitoring large-scale web search and recommender systems. Therefore, we consider predicting user engagement status as the very first and critical step to online evaluation for intelligent assistants. In this work, we first proposed a novel framework for classifying user engagement status into four categories -- fulfillment, continuation, reformulation and abandonment. We then demonstrated how to design simple but indicative metrics based on the framework to quantify user engagement levels. We also aim for automating user engagement prediction with machine learning methods. We compare various models and features for predicting engagement status using four real-world datasets. We conducted detailed analyses on features and failure cases to discuss the performance of current models as well as challenges.
翻译:大规模和在线环境中的智能助理评价仍是一个公开的挑战。用户基于行为的在线评价指标在监测大规模网络搜索和建议系统方面显示出极大的效力。因此,我们考虑预测用户参与状况,作为智能助理在线评价的第一步和关键步骤。在这项工作中,我们首先提出一个新的框架,将用户参与状况分为四类:完成、持续、重新制定和放弃。然后我们展示了如何根据量化用户参与水平的框架设计简单但具有指示性的衡量标准。我们还力求用机器学习方法实现用户参与预测自动化。我们用四个真实世界数据集比较预测参与状况的各种模型和特征。我们详细分析了当前模式的特征和失败案例,讨论了当前模式的绩效以及挑战。