Dialogue safety problems severely limit the real-world deployment of neural conversational models and attract great research interests recently. We propose a taxonomy for dialogue safety specifically designed to capture unsafe behaviors that are unique in human-bot dialogue setting, with focuses on context-sensitive unsafety, which is under-explored in prior works. To spur research in this direction, we compile DiaSafety, a dataset of 6 unsafe categories with rich context-sensitive unsafe examples. Experiments show that existing utterance-level safety guarding tools fail catastrophically on our dataset. As a remedy, we train a context-level dialogue safety classifier to provide a strong baseline for context-sensitive dialogue unsafety detection. With our classifier, we perform safety evaluations on popular conversational models and show that existing dialogue systems are still stuck in context-sensitive safety problems.
翻译:对话安全问题严重限制了现实世界中神经谈话模式的部署,最近吸引了巨大的研究兴趣。我们提议对话安全分类,专门旨在捕捉在人类机器人对话环境中独一无二的不安全行为,重点是对背景敏感的不安全性,在先前的工作中对此没有进行充分探讨。为了推动这方面的研究,我们汇编了DiaSafety,这是一个由6个不安全类别组成的数据集,有丰富的背景敏感的不安全实例。实验显示,现有言论层面的安全防护工具在我们的数据集上灾难性地失败了。作为一种补救措施,我们培训了背景层面的对话安全分类,为环境敏感对话的不安全性探测提供了强有力的基线。我们通过分类,我们对大众谈话模式进行了安全评估,并显示现有的对话系统仍然被环境敏感安全问题所困。