Open-domain chatbots are supposed to converse freely with humans without being restricted to a topic, task or domain. However, the boundaries and/or contents of open-domain conversations are not clear. To clarify the boundaries of "openness", we conduct two studies: First, we classify the types of "speech events" encountered in a chatbot evaluation data set (i.e., Meena by Google) and find that these conversations mainly cover the "small talk" category and exclude the other speech event categories encountered in real life human-human communication. Second, we conduct a small-scale pilot study to generate online conversations covering a wider range of speech event categories between two humans vs. a human and a state-of-the-art chatbot (i.e., Blender by Facebook). A human evaluation of these generated conversations indicates a preference for human-human conversations, since the human-chatbot conversations lack coherence in most speech event categories. Based on these results, we suggest (a) using the term "small talk" instead of "open-domain" for the current chatbots which are not that "open" in terms of conversational abilities yet, and (b) revising the evaluation methods to test the chatbot conversations against other speech events.
翻译:开放式聊天室应该与人自由交谈,而不局限于主题、任务或领域。 但是,开放式对话的界限和(或)内容并不明确。 为了澄清“开放”的界限,我们进行了两项研究: 首先,我们对聊天室评价数据集(即谷歌的Meena)中遇到的“语音事件”的类型进行分类,发现这些对话主要涵盖“小型谈话”类别,并排除在现实生活中人类交流中遇到的其他演讲活动类别。 其次,我们进行小规模试点研究,以产生涵盖两个人与人之间更广泛的言论事件类别的在线对话(即脸书上的Blender)。 对这些对话的人类评估表明,人类对话偏好于人类对话,因为人类聊天室对话在大多数演讲活动类别中缺乏一致性。 基于这些结果,我们建议 (a) 使用“小型对话”一词,而不是“开放式” 来生成当前聊天室和最先进的聊天室对话(即脸书上的Blender) 之间更广泛的言论活动类别, 而不是“公开对话能力 ” 。