更深入的 Fuse it! 文本生成具有图层- Wise 边端变量推断的变异变异变异变异器Name (Fuse It More Deeply! A Variational Transformer with Layer-Wise Latent Variable Inference for Text Generation) - 专知论文

会员服务 ·

0

潜变量/隐变量 · 潜在 · 推断 · 变换 · 可约的 ·

2022 年 10 月 21 日

Fuse It More Deeply! A Variational Transformer with Layer-Wise Latent Variable Inference for Text Generation

翻译：更深入的 Fuse it! 文本生成具有图层- Wise 边端变量推断的变异变异变异变异器Name

Jinyi Hu,Xiaoyuan Yi,Wenhao Li,Maosong Sun,Xing Xie

from arxiv, NAACL 2022

The past several years have witnessed Variational Auto-Encoder's superiority in various text generation tasks. However, due to the sequential nature of the text, auto-regressive decoders tend to ignore latent variables and then reduce to simple language models, known as the KL vanishing problem, which would further deteriorate when VAE is combined with Transformer-based structures. To ameliorate this problem, we propose DELLA, a novel variational Transformer framework. DELLA learns a series of layer-wise latent variables with each inferred from those of lower layers and tightly coupled with the hidden states by low-rank tensor product. In this way, DELLA forces these posterior latent variables to be fused deeply with the whole computation path and hence incorporate more information. We theoretically demonstrate that our method can be regarded as entangling latent variables to avoid posterior information decrease through layers, enabling DELLA to get higher non-zero KL values even without any annealing or thresholding tricks. Experiments on four unconditional and three conditional generation tasks show that DELLA could better alleviate KL vanishing and improve both quality and diversity compared to several strong baselines.

翻译：过去几年中,在各种文本生成任务中,自动-自动编码器的优势是变化式的。然而,由于文本的顺序性质,自动递减式解码器往往忽视潜伏变量,然后缩为简单的语言模型,称为KL的消失问题,当VAE与基于变异器的结构相结合时,这一问题会进一步恶化。为了解决这个问题,我们建议DELLA,一个新的变异变异器框架DELLA。DELLA从从低层和低层产品与隐藏状态紧密结合中,学习一系列层次上的潜在变量。这样,DELLA将这些后层潜在变量与整个计算路径紧密结合,从而纳入更多的信息。我们理论上证明,我们的方法可以被视为潜在的潜在变量,以避免后层信息减少,使DELLA获得更高的非零KL值,即使不产生任何内嵌或临界的把戏。在四种无条件和三种有条件的一代任务上进行的实验表明,DELLLA可以更好地减轻KL的消失,并改进质量和多样性,而将几个基线加以加强。

0

相关内容

潜变量/隐变量

潜变量/隐变量

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

不可错过！华盛顿大学最新《生成式模型》课程，附PPT

不可错过！华盛顿大学最新《生成式模型》课程，附PPT

专知会员服务

65+阅读 · 2020年12月11日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

介孔材料受限空间中的AGET ATRP和ARGET ATRP聚合反应

国家自然科学基金

0+阅读 · 2016年12月31日

RNA结合蛋白Smaug识别果蝇生殖发育关键基因oskar mRNA的结构机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于SVM技术构建腰椎退变源性下腰痛发病风险预测模型研究

国家自然科学基金

0+阅读 · 2013年12月31日

β2-AR/PKA通路在内皮祖细胞修复急性肾损伤中的作用及机制

国家自然科学基金

0+阅读 · 2013年12月31日

慢性间歇低氧小鼠内皮细胞micro-RNA表达谱及分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

甲基化修饰的候选抑癌基因DKK2在乳腺癌中调控Wnt和Notch信号通路交叉对话的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

PDGF/PDGFR于原发性血小板增多症发病机制中作用及机理的研究

国家自然科学基金

0+阅读 · 2012年12月31日

Witten Laplacian的特征值及与其相关的Ricci Soliton研究

国家自然科学基金

0+阅读 · 2012年12月31日

代数曲线在序列中的应用

国家自然科学基金

0+阅读 · 2011年12月31日

芪莲舒痞颗粒逆转慢性萎缩性胃炎癌前病变的机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

Denoising Deep Generative Models

Denoising Deep Generative Models

Arxiv

1+阅读 · 2022年12月5日

Longitudinal modeling of age-dependent latent traits with generalized additive latent and mixed models

Arxiv

0+阅读 · 2022年12月5日

Variational Inference for Semiparametric Bayesian Novelty Detection in Large Datasets

Arxiv

0+阅读 · 2022年12月4日

Comparative layer-wise analysis of self-supervised speech models

Arxiv

0+阅读 · 2022年12月3日

Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs

Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs

Arxiv

0+阅读 · 2022年12月1日

GRiT: A Generative Region-to-text Transformer for Object Understanding

Arxiv

0+阅读 · 2022年12月1日

Pix2seq: A Language Modeling Framework for Object Detection

Arxiv

10+阅读 · 2021年9月22日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Arxiv

19+阅读 · 2020年11月18日

Adversarial Mutual Information for Text Generation

Adversarial Mutual Information for Text Generation

Arxiv

13+阅读 · 2020年6月30日

VIP会员

文章信息

相关主题

潜变量/隐变量

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

不可错过！华盛顿大学最新《生成式模型》课程，附PPT

不可错过！华盛顿大学最新《生成式模型》课程，附PPT

专知会员服务

65+阅读 · 2020年12月11日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津博士论文】零样本强化学习综述

《美军条令：陆军指挥官与规划人员地理空间指南》60页

战术边缘指挥控制：防务面临的核心挑战

迈向开放世界检测：综述

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Denoising Deep Generative Models

Denoising Deep Generative Models

Arxiv

1+阅读 · 2022年12月5日

Longitudinal modeling of age-dependent latent traits with generalized additive latent and mixed models

Arxiv

0+阅读 · 2022年12月5日

Variational Inference for Semiparametric Bayesian Novelty Detection in Large Datasets

Arxiv

0+阅读 · 2022年12月4日

Comparative layer-wise analysis of self-supervised speech models

Arxiv

0+阅读 · 2022年12月3日

Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs

Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs

Arxiv

0+阅读 · 2022年12月1日

GRiT: A Generative Region-to-text Transformer for Object Understanding

Arxiv

0+阅读 · 2022年12月1日

Pix2seq: A Language Modeling Framework for Object Detection

Arxiv

10+阅读 · 2021年9月22日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Arxiv

19+阅读 · 2020年11月18日

Adversarial Mutual Information for Text Generation

Adversarial Mutual Information for Text Generation

Arxiv

13+阅读 · 2020年6月30日

相关基金

介孔材料受限空间中的AGET ATRP和ARGET ATRP聚合反应

国家自然科学基金

0+阅读 · 2016年12月31日

RNA结合蛋白Smaug识别果蝇生殖发育关键基因oskar mRNA的结构机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于SVM技术构建腰椎退变源性下腰痛发病风险预测模型研究

国家自然科学基金

0+阅读 · 2013年12月31日

β2-AR/PKA通路在内皮祖细胞修复急性肾损伤中的作用及机制

国家自然科学基金

0+阅读 · 2013年12月31日

慢性间歇低氧小鼠内皮细胞micro-RNA表达谱及分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

甲基化修饰的候选抑癌基因DKK2在乳腺癌中调控Wnt和Notch信号通路交叉对话的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

PDGF/PDGFR于原发性血小板增多症发病机制中作用及机理的研究

国家自然科学基金

0+阅读 · 2012年12月31日

Witten Laplacian的特征值及与其相关的Ricci Soliton研究

国家自然科学基金

0+阅读 · 2012年12月31日

代数曲线在序列中的应用

国家自然科学基金

0+阅读 · 2011年12月31日

芪莲舒痞颗粒逆转慢性萎缩性胃炎癌前病变的机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员