变换器中整体构成化的迭代代代用法 (Iterative Decoding for Compositional Generalization in Transformers) - 专知论文

会员服务 ·

0

泛化理论 · 变换 · seq2seq · 解码 · 学成 ·

2021 年 12 月 9 日

Iterative Decoding for Compositional Generalization in Transformers

翻译：变换器中整体构成化的迭代代代用法

Luana Ruiz,Joshua Ainslie,Santiago Ontañón

Deep learning models generalize well to in-distribution data but struggle to generalize compositionally, i.e., to combine a set of learned primitives to solve more complex tasks. In sequence-to-sequence (seq2seq) learning, transformers are often unable to predict correct outputs for longer examples than those seen at training. This paper introduces iterative decoding, an alternative to seq2seq that (i) improves transformer compositional generalization in the PCFG and Cartesian product datasets and (ii) evidences that, in these datasets, seq2seq transformers do not learn iterations that are not unrolled. In iterative decoding, training examples are broken down into a sequence of intermediate steps that the transformer learns iteratively. At inference time, the intermediate outputs are fed back to the transformer as intermediate inputs until an end-of-iteration token is predicted. We conclude by illustrating some limitations of iterative decoding in the CFQ dataset.

翻译：深度学习模型一般地概括了分布中的数据,但努力概括组成,即将一组学习过的原始原始数据结合起来,以解决更复杂的任务。在从序列到序列(seq2seq)的学习中,变压器往往无法预测出比培训时要长的示例的正确输出。本文介绍了迭代解码,这是后继2seq的替代数据,即(一) 改进 PCFG 和Cartesian 产品数据集中的变压器组成概括,以及(二) 证据,在这些数据集中,后继2seq变压器不学习未解开的迭代。在迭代解码中,培训实例被细分为变压器迭代学习的中间步骤序列。在回溯时间,中间输出被反馈到变压器中作为中间输入,直到预测电量末符号。我们最后通过说明CFQ数据集中迭代解码的局限性。

1

相关内容

泛化理论

【ICML2021】具有线性复杂度的Transformer的相对位置编码

【ICML2021】具有线性复杂度的Transformer的相对位置编码

专知会员服务

25+阅读 · 2021年5月20日

【斯坦福&Facebook】生成式对抗变换器，Generative Adversarial Transformers

专知会员服务

21+阅读 · 2021年4月21日

【AAAI2021】记忆门控循环网络

【AAAI2021】记忆门控循环网络

专知会员服务

50+阅读 · 2020年12月28日

最新【深度生成模型】Deep Generative Models，104页ppt

最新【深度生成模型】Deep Generative Models，104页ppt

专知会员服务

71+阅读 · 2020年10月24日

【KDD2020】图神经网络生成式预训练，GPT-GNN: Generative Pre-Training of Graph Neural Networks

【KDD2020】图神经网络生成式预训练，GPT-GNN: Generative Pre-Training of Graph Neural Networks

专知会员服务

99+阅读 · 2020年7月3日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【Google 76分钟训练万BERT最新论文】Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

【Google 76分钟训练万BERT最新论文】Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

专知会员服务

4+阅读 · 2020年1月7日

基于图的word2vec负采样( GNEG:Graph-Based Negative Sampling for word2vec)

基于图的word2vec负采样( GNEG:Graph-Based Negative Sampling for word2vec)

专知会员服务

40+阅读 · 2019年11月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

利用 Universal Transformer，翻译将无往不利！

利用 Universal Transformer，翻译将无往不利！

谷歌开发者

5+阅读 · 2018年9月4日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

从 Encoder 到 Decoder 实现 Seq2Seq 模型

从 Encoder 到 Decoder 实现 Seq2Seq 模型

AI研习社

10+阅读 · 2018年2月10日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Online Decision Transformer

Arxiv

4+阅读 · 2022年2月11日

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

Arxiv

3+阅读 · 2021年3月22日

Contrastive Triple Extraction with Generative Transformer

Arxiv

13+阅读 · 2021年2月4日

Back to the Future: Unsupervised Backprop-based Decoding for Counterfactual and Abductive Commonsense Reasoning

Arxiv

3+阅读 · 2020年10月20日

Attention Forcing for Sequence-to-sequence Model Training

Attention Forcing for Sequence-to-sequence Model Training

Arxiv

7+阅读 · 2019年9月26日

Music Transformer

Music Transformer

Arxiv

5+阅读 · 2018年12月12日

Learning Implicit Fields for Generative Shape Modeling

Learning Implicit Fields for Generative Shape Modeling

Arxiv

10+阅读 · 2018年12月6日

Training Generative Adversarial Networks Via Turing Test

Training Generative Adversarial Networks Via Turing Test

Arxiv

3+阅读 · 2018年10月25日

Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks

Arxiv

3+阅读 · 2018年6月6日

GraphRNN: A Deep Generative Model for Graphs

Arxiv

6+阅读 · 2018年2月24日

VIP会员

文章信息

相关主题

相关VIP内容

【ICML2021】具有线性复杂度的Transformer的相对位置编码

【ICML2021】具有线性复杂度的Transformer的相对位置编码

专知会员服务

25+阅读 · 2021年5月20日

【斯坦福&Facebook】生成式对抗变换器，Generative Adversarial Transformers

专知会员服务

21+阅读 · 2021年4月21日

【AAAI2021】记忆门控循环网络

【AAAI2021】记忆门控循环网络

专知会员服务

50+阅读 · 2020年12月28日

最新【深度生成模型】Deep Generative Models，104页ppt

最新【深度生成模型】Deep Generative Models，104页ppt

专知会员服务

71+阅读 · 2020年10月24日

【KDD2020】图神经网络生成式预训练，GPT-GNN: Generative Pre-Training of Graph Neural Networks

【KDD2020】图神经网络生成式预训练，GPT-GNN: Generative Pre-Training of Graph Neural Networks

专知会员服务

99+阅读 · 2020年7月3日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【Google 76分钟训练万BERT最新论文】Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

【Google 76分钟训练万BERT最新论文】Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

专知会员服务

4+阅读 · 2020年1月7日

基于图的word2vec负采样( GNEG:Graph-Based Negative Sampling for word2vec)

基于图的word2vec负采样( GNEG:Graph-Based Negative Sampling for word2vec)

专知会员服务

40+阅读 · 2019年11月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《毁灭算法：解析以色列在加沙的AI军事行动》

【COLT 2025最新教程】语言生成

以机器速度锁定目标：人工智能的能力与局限

【ICML2025】通过在线世界模型规划的持续强化学习

相关资讯

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

利用 Universal Transformer，翻译将无往不利！

利用 Universal Transformer，翻译将无往不利！

谷歌开发者

5+阅读 · 2018年9月4日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

从 Encoder 到 Decoder 实现 Seq2Seq 模型

从 Encoder 到 Decoder 实现 Seq2Seq 模型

AI研习社

10+阅读 · 2018年2月10日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Online Decision Transformer

Arxiv

4+阅读 · 2022年2月11日

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

Arxiv

3+阅读 · 2021年3月22日

Contrastive Triple Extraction with Generative Transformer

Arxiv

13+阅读 · 2021年2月4日

Back to the Future: Unsupervised Backprop-based Decoding for Counterfactual and Abductive Commonsense Reasoning

Arxiv

3+阅读 · 2020年10月20日

Attention Forcing for Sequence-to-sequence Model Training

Attention Forcing for Sequence-to-sequence Model Training

Arxiv

7+阅读 · 2019年9月26日

Music Transformer

Music Transformer

Arxiv

5+阅读 · 2018年12月12日

Learning Implicit Fields for Generative Shape Modeling

Learning Implicit Fields for Generative Shape Modeling

Arxiv

10+阅读 · 2018年12月6日

Training Generative Adversarial Networks Via Turing Test

Training Generative Adversarial Networks Via Turing Test

Arxiv

3+阅读 · 2018年10月25日

Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks

Arxiv

3+阅读 · 2018年6月6日

GraphRNN: A Deep Generative Model for Graphs

Arxiv

6+阅读 · 2018年2月24日

微信扫码咨询专知VIP会员