Current open-domain conversational models can easily be made to talk in inadequate ways. Online learning from conversational feedback given by the conversation partner is a promising avenue for a model to improve and adapt, so as to generate fewer of these safety failures. However, current state-of-the-art models tend to react to feedback with defensive or oblivious responses. This makes for an unpleasant experience and may discourage conversation partners from giving feedback in the future. This work proposes SaFeRDialogues, a task and dataset of graceful responses to conversational feedback about safety failures. We collect a dataset of 10k dialogues demonstrating safety failures, feedback signaling them, and a response acknowledging the feedback. We show how fine-tuning on this dataset results in conversations that human raters deem considerably more likely to lead to a civil conversation, without sacrificing engagingness or general conversational ability.
翻译:当前的开放域对话模式可以很容易地以不适当的方式进行交谈。 在线学习对话伙伴提供的谈话反馈是改进和调整模式的有希望的渠道,这样可以减少这些安全故障。 但是,目前最先进的模式往往以防御性或忽视性回应方式对反馈作出反应。 这会造成不愉快的经历,并可能阻碍对话伙伴在未来提供反馈。 这项工作提出了Saferridialoges, 一个任务和数据集, 包含对关于安全故障的谈话反馈的优雅反应。 我们收集了10k对话的数据集, 显示安全故障, 反馈信号, 以及确认反馈的响应。 我们展示了这一数据集的微调如何导致对话, 人类评级者认为这种对话更有可能导致民间对话, 而不牺牲参与性或一般对话能力 。