Direct decoding for task-oriented dialogue is known to suffer from the explaining-away effect, manifested in models that prefer short and generic responses. Here we argue for the use of Bayes' theorem to factorize the dialogue task into two models, the distribution of the context given the response, and the prior for the response itself. This approach, an instantiation of the noisy channel model, both mitigates the explaining-away effect and allows the principled incorporation of large pretrained models for the response prior. We present extensive experiments showing that a noisy channel model decodes better responses compared to direct decoding and that a two stage pretraining strategy, employing both open-domain and task-oriented dialogue data, improves over randomly initialized models.
翻译:众所周知,以任务为导向的对话的直接解码工作受到解释结果的影响,这表现在倾向于短期和一般性答复的模式中。在这里,我们主张使用贝耶斯理论将对话任务纳入两种模式,即根据答复分配背景,以及答复本身之前。这个方法,即即即即时发布吵闹的频道模式,既减轻解释效果,又允许在原则上纳入大型预先培训的应对模式。我们提出了广泛的实验,表明一个吵闹的频道模式比直接解码更好地解码反应,以及两个阶段的预培训战略,即采用开放和面向任务的对话数据,改进了随机初始模式。