Response generation for task-oriented dialogues involves two basic components: dialogue planning and surface realization. These two components, however, have a discrepancy in their objectives, i.e., task completion and language quality. To deal with such discrepancy, conditioned response generation has been introduced where the generation process is factorized into action decision and language generation via explicit action representations. To obtain action representations, recent studies learn latent actions in an unsupervised manner based on the utterance lexical similarity. Such an action learning approach is prone to diversities of language surfaces, which may impinge task completion and language quality. To address this issue, we propose multi-stage adaptive latent action learning (MALA) that learns semantic latent actions by distinguishing the effects of utterances on dialogue progress. We model the utterance effect using the transition of dialogue states caused by the utterance and develop a semantic similarity measurement that estimates whether utterances have similar effects. For learning semantic actions on domains without dialogue states, MsALA extends the semantic similarity measurement across domains progressively, i.e., from aligning shared actions to learning domain-specific actions. Experiments using multi-domain datasets, SMD and MultiWOZ, show that our proposed model achieves consistent improvements over the baselines models in terms of both task completion and language quality.
翻译:面向任务的对话的响应生成涉及两个基本组成部分:对话规划和表面实现,但这两个组成部分在目标上存在差异,即任务完成和语言质量。为了处理这种差异,在将生成过程纳入行动决定和通过明确行动表述生成语言时,引入了有条件响应生成,将生成过程纳入行动决定和语言生成。为了获得行动说明,最近的研究报告以不监督的方式根据发音词汇相似性学习潜在行动。这种行动学习方法容易在语言表面上出现差异,这可能会影响任务完成和语言质量。为了解决这一问题,我们建议采用多阶段适应潜在行动学习(MALA),通过区分对对话进展的发音的影响来学习语义潜在行动。我们用发音引起的对话状态转型来模拟语义效应,并开发一种语义相似性测量方法,用以估计语义是否具有类似效果。对于没有对话的国家,MLALA将语系相似性的测量方法扩大到各个领域,也就是说,从将共享行动到学习域特定语言质量行动的效果,我们用多层次的模型来模拟,同时展示多层次任务任务。