R-U-A-Robot 数据集:通过检测用户关于人类或非人类身份的问题,帮助避免聊天机器人的欺骗 (The R-U-A-Robot Dataset: Helping Avoid Chatbot Deception by Detecting User Questions About Human or Non-Human Identity)

Humans are increasingly interacting with machines through language, sometimes in contexts where the user may not know they are talking to a machine (like over the phone or a text chatbot). We aim to understand how system designers and researchers might allow their systems to confirm its non-human identity. We collect over 2,500 phrasings related to the intent of ``Are you a robot?". This is paired with over 2,500 adversarially selected utterances where only confirming the system is non-human would be insufficient or disfluent. We compare classifiers to recognize the intent and discuss the precision/recall and model complexity tradeoffs. Such classifiers could be integrated into dialog systems to avoid undesired deception. We then explore how both a generative research model (Blender) as well as two deployed systems (Amazon Alexa, Google Assistant) handle this intent, finding that systems often fail to confirm their non-human identity. Finally, we try to understand what a good response to the intent would be, and conduct a user study to compare the important aspects when responding to this intent.

翻译：人类越来越多地通过语言与机器互动,有时是在用户可能不知道他们与机器(如电话或文本聊天机)交谈的情况下。我们的目标是了解系统设计者和研究人员如何允许其系统确认其非人类身份。我们收集了2 500多条与“你是一个机器人”的意图有关的文字。这是与2 500多条只确认系统是非人类或非人类身份的对立选择话对齐的。我们比较了分类者,以确认其意图并讨论精确/召回和模型复杂性的权衡。这些分类者可以融入对话系统以避免不理想的欺骗。然后我们探索一个基因化研究模型(Blander)和两个部署的系统(Amazon Alexa,谷歌助理)如何处理这一意图,发现系统往往无法证实其非人类身份。最后,我们试图了解对意图的好反应是什么,并进行用户研究,以比较在回应这一意图时的重要方面。