Goal-oriented dialogue systems aim to help users achieve certain goals. Therefore, how humans perceive their helpfulness is important. However, neither the human-perceived helpfulness of goal-oriented dialogue systems nor its fairness implication has been well studied. In this paper, we study computational measurements of helpfulness. We first formally define a dialogue response as helpful if it is relevant & coherent, useful, and informative to a query. Then, we collect human annotations for the helpfulness of dialogue responses based on our definition and build a classifier to automatically determine the helpfulness of a response. We further propose to use the helpfulness level of a dialogue system towards different user queries to measure the fairness of a dialogue system. Experiments with state-of-the-art dialogue systems under three information-seeking scenarios reveal that existing systems tend to be more helpful for questions regarding concepts from highly-developed countries than less-developed countries, uncovering potential fairness concerns underlying the current goal-oriented dialogue systems.
翻译:以目标为导向的对话系统旨在帮助用户实现某些目标。因此,人类如何看待其帮助是十分重要的。然而,没有很好地研究过面向目标的对话系统对人类的帮助,也没有研究其公平含义。在本文件中,我们研究了对帮助的计算尺度。我们首先正式确定对话的响应,如果对话是相关和一致的、有用的和对询问的丰富信息的话,是有用的。然后,我们收集人文说明,以便根据我们的定义对对话的反应有所帮助,并建立一个分类器,自动确定回应的有益性。我们进一步提议利用对话系统的有用性水平来衡量不同用户的查询,以衡量对话系统的公平性。在三种信息搜索情景下对最先进的对话系统的实验表明,现有系统往往比欠发达国家更有助于了解有关概念的问题,从而发现当前以目标为导向的对话系统的潜在公平问题。