GPT 半悬浮任务导向对话系统变化式中时状态GPT (Variational Latent-State GPT for Semi-Supervised Task-Oriented Dialog Systems)

Recently, two approaches, fine-tuning large pre-trained language models and variational training, have attracted significant interests, separately, for semi-supervised end-to-end task-oriented dialog (TOD) systems. In this paper, we propose Variational Latent-State GPT model (VLS-GPT), which is the first to combine the strengths of the two approaches. Among many options of models, we propose the generative model and the inference model for variational learning of the end-to-end TOD system, both as auto-regressive language models based on GPT-2, which can be further trained over a mix of labeled and unlabeled dialog data in a semi-supervised manner. Variational training of VLS-GPT is both statistically and computationally more challenging than previous variational learning works for sequential latent variable models, which use turn-level first-order Markovian. The inference model in VLS-GPT is non-Markovian due to the use of the Transformer architecture. In this work, we establish Recursive Monte Carlo Approximation (RMCA) to the variational objective with non-Markovian inference model and prove its unbiasedness. Further, we develop the computational strategy of sampling-then-forward-computation to realize RMCA, which successfully overcomes the memory explosion issue of using GPT in variational learning and speeds up training. Semi-supervised TOD experiments are conducted on two benchmark multi-domain datasets of different languages - MultiWOZ2.1 and CrossWOZ. VLS-GPT is shown to significantly outperform both supervised-only and semi-supervised self-training baselines.

翻译：最近,两种方法,即微调大型预先培训的语言模型和变式培训,分别吸引了半监督端对端任务导向对话框(TOD)系统的极大兴趣。在本文件中,我们提议了半监督端对端任务导向对话框(TOD)的混合数据。 VLS-GPT(VLS-GPT)的动态培训在统计上和计算上都比先前的动态潜在变量模型变异学习工作更具挑战性。在多种模型的许多选项中,我们提议了变异模型和变异学习TOD系统变异模型,两者都是基于GPT-2的自动递增语言模型。GPT2的多变性语言可以以半监督方式对标签和无标签的终端任务导向对话框数据进行进一步的培训。 VLS-GPT(VLS-GGPT)的动态培训模式比以往的变异性学习工作更具挑战性。 VLS-GPT(G-GPT)的推导模型是非马尔基公司,因为使用变异性结构。在这项工作中,我们建立了REC-recurvious-C-LAD-LOD-S-LOD-S-S-S-Servicevoriz-S-S)的升级战略,在测试中,在测试中进行更变现。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日