Task-oriented dialogue systems aim to answer questions from users and provide immediate help. Therefore, how humans perceive their helpfulness is important. However, neither the human-perceived helpfulness of task-oriented dialogue systems nor its fairness implication has been studied yet. In this paper, we define a dialogue response as helpful if it is relevant & coherent, useful, and informative to a query and study computational measurements of helpfulness. Then, we propose utilizing the helpfulness level of different groups to gauge the fairness of a dialogue system. To study this, we collect human annotations for the helpfulness of dialogue responses and build a classifier that can automatically determine the helpfulness of a response. We design experiments under 3 information-seeking scenarios and collect instances for each from Wikipedia. With collected instances, we use carefully-constructed questions to query the state-of-the-art dialogue systems. Through analysis, we find that dialogue systems tend to be more helpful for highly-developed countries than less-developed countries, uncovering a fairness issue underlying these dialogue systems.
翻译:以任务为导向的对话系统旨在回答用户的问题并提供即时帮助。 因此,人们如何看待其帮助是十分重要的。 但是,还没有研究面向任务的对话系统对人类是否有帮助,也没有研究其公平含义。 在本文件中,我们将对话反应定义为有帮助的,如果对话反应是相关和一致的,有用和丰富的,对查询和研究关于帮助的计算衡量方法有帮助。然后,我们建议利用不同团体的帮助水平来衡量对话系统的公平性。为了研究这一点,我们收集了对话反应的有用性,并建立了一个分类器,可以自动确定回应的有用性。我们设计了三个信息搜索方案下的实验,并从维基百科收集了每个案例。我们收集了案例,用仔细构建的问题来询问最先进的对话系统。通过分析,我们发现对话系统往往比欠发达国家更有利于高度发达的国家,揭示了这些对话系统背后的公平问题。