Existing approaches built separate classifiers to detect nonsense in dialogues. In this paper, we show that without external classifiers, dialogue models can detect errors in their own messages introspectively, by calculating the likelihood of replies that are indicative of poor messages. For example, if an agent believes its partner is likely to respond "I don't understand" to a candidate message, that message may not make sense, so an alternative message should be chosen. We evaluate our approach on a dataset from the game Diplomacy, which contains long dialogues richly grounded in the game state, on which existing models make many errors. We first show that hand-crafted replies can be effective for the task of detecting nonsense in applications as complex as Diplomacy. We then design AutoReply, an algorithm to search for such discriminative replies automatically, given a small number of annotated dialogue examples. We find that AutoReply-generated replies outperform handcrafted replies and perform on par with carefully fine-tuned large supervised models. Results also show that one single reply without much computation overheads can also detect dialogue nonsense reasonably well.
翻译:在本文中,我们显示,没有外部分类者,对话模式可以通过计算答复可能表明错误信息的可能性来发现自己信息中的错误。例如,如果一个代理者认为其伙伴有可能对候选信息做出“我不明白”回应,那么该信息可能没有意义,因此应该选择一个替代信息。我们评估了游戏外交数据集中我们的方法,该数据集中包含大量基于游戏状态的长长对话,而现有模型正是在游戏状态上做出许多错误的。我们首先显示,手动答复对于在像外交那样复杂的应用程序中发现无稽之谈的工作是有效的。我们随后设计了自动检索算法,以自动检索这种歧视性答复,以少量附加说明的对话实例为例。我们发现,自动检索生成的答复超越了手写的答复,并且与仔细调整的大型监督模型同时进行。结果还显示,一个没有大量计算间接费用的单一答复也可以很好地探测对话的无稽误。