User ratings play a significant role in spoken dialogue systems. Typically, such ratings tend to be averaged across all users and then utilized as feedback to improve the system or personalize its behavior. While this method can be useful to understand broad, general issues with the system and its behavior, it does not take into account differences between users that affect their ratings. In this work, we conduct a study to better understand how people rate their interactions with conversational agents. One macro-level characteristic that has been shown to correlate with how people perceive their inter-personal communication is personality. We specifically focus on agreeableness and extraversion as variables that may explain variation in ratings and therefore provide a more meaningful signal for training or personalization. In order to elicit those personality traits during an interaction with a conversational agent, we designed and validated a fictional story, grounded in prior work in psychology. We then implemented the story into an experimental conversational agent that allowed users to opt-in to hearing the story. Our results suggest that for human-conversational agent interactions, extraversion may play a role in user ratings, but more data is needed to determine if the relationship is significant. Agreeableness, on the other hand, plays a statistically significant role in conversation ratings: users who are more agreeable are more likely to provide a higher rating for their interaction. In addition, we found that users who opted to hear the story were, in general, more likely to rate their conversational experience higher than those who did not.
翻译:用户评级在语音对话系统中起着重要作用。 一般来说, 这种评级通常在所有用户中平均使用, 然后用作反馈, 来改进系统或个人化行为。 虽然这种方法可以有助于理解系统及其行为的广泛、 一般性问题, 但它没有考虑到用户之间影响评级的不同。 在这项工作中, 我们开展研究, 以更好地了解人们如何与对话媒介进行互动。 一个宏观层面的特征, 已经显示与人们如何看待个人之间的沟通是个性相关。 我们特别侧重于可喜性和外向性, 作为解释评级差异的变量, 从而为培训或个人化提供更有意义的信号。 为了在与一个对话代理人的互动中了解这些个性特征, 我们设计并验证了一个基于先前心理学工作的虚构故事。 然后我们将故事应用到一个实验性对话媒介, 允许用户选择听故事。 我们的结果表明, 人类- 对话代理人的互动, 外向用户评级发挥作用, 但需要更多的数据来确定关系是否重要。 在与对话中, 比较容易理解的用户, 更有可能的用户在另一个层次上找到一个更能理解的层次。