Text response generation for multimodal task-oriented dialog systems, which aims to generate the proper text response given the multimodal context, is an essential yet challenging task. Although existing efforts have achieved compelling success, they still suffer from two pivotal limitations: 1) overlook the benefit of generative pre-training, and 2) ignore the textual context related knowledge. To address these limitations, we propose a novel dual knowledge-enhanced generative pretrained language model for multimodal task-oriented dialog systems (DKMD), consisting of three key components: dual knowledge selection, dual knowledge-enhanced context learning, and knowledge-enhanced response generation. To be specific, the dual knowledge selection component aims to select the related knowledge according to both textual and visual modalities of the given context. Thereafter, the dual knowledge-enhanced context learning component targets seamlessly integrating the selected knowledge into the multimodal context learning from both global and local perspectives, where the cross-modal semantic relation is also explored. Moreover, the knowledge-enhanced response generation component comprises a revised BART decoder, where an additional dot-product knowledge-decoder attention sub-layer is introduced for explicitly utilizing the knowledge to advance the text response generation. Extensive experiments on a public dataset verify the superiority of the proposed DKMD over state-of-the-art competitors.
翻译:面向多式联运的面向任务的对话系统(DKMD)的文本反应生成新颖的双重知识强化的先入为主的语言模式(DKMD)由三个关键部分组成:双重知识选择、双重知识强化的背景学习和知识强化的生成。具体而言,双重知识选择部分旨在根据特定背景的文字和视觉模式选择相关知识。此后,双重知识强化背景学习部分的目标是从全球和地方角度将选定的知识顺利地纳入多式联运背景下学习,同时探讨跨模式任务导向的对话系统(DKMD)的跨模式语言关系。此外,知识强化的生成部分包括经修订的BART脱coder,在此部分中,根据特定背景的文字和视觉模式选择相关知识。随后,双重知识强化背景学习部分的目标是将选定的知识纳入从全球和地方角度学习的多式联运背景下。还探讨了跨模式的语系关系。此外,知识增强的生成部分包括一个经过修订的BART脱coder,其中将额外的多产品知识强化的注意力强化部分,即根据特定背景选择的文本选择相关知识。随后,将明确采用关于创建高层次的数据测试。