FIT: Far-reaching Interleaved Transformers - 专知论文

会员服务 ·

0

层 · GROUP · Performer · 情景 · 变换 ·

2023 年 5 月 22 日

FIT: Far-reaching Interleaved Transformers

翻译：暂无翻译

Ting Chen,Lala Li

from arxiv, preliminary work (code at https://github.com/google-research/pix2seq)

We present FIT: a transformer-based architecture with efficient self-attention and adaptive computation. Unlike original transformers, which operate on a single sequence of data tokens, we divide the data tokens into groups, with each group being a shorter sequence of tokens. We employ two types of transformer layers: local layers operate on data tokens within each group, while global layers operate on a smaller set of introduced latent tokens. These layers, comprising the same set of self-attention and feed-forward layers as standard transformers, are interleaved, and cross-attention is used to facilitate information exchange between data and latent tokens within the same group. The attention complexity is $O(n^2)$ locally within each group of size $n$, but can reach $O(L^{{4}/{3}})$ globally for sequence length of $L$. The efficiency can be further enhanced by relying more on global layers that perform adaptive computation using a smaller set of latent tokens. FIT is a versatile architecture and can function as an encoder, diffusion decoder, or autoregressive decoder. We provide initial evidence demonstrating its effectiveness in high-resolution image understanding and generation tasks. Notably, FIT exhibits potential in performing end-to-end training on gigabit-scale data, such as 6400$\times$6400 images, even without specific optimizations or model parallelism.

翻译：暂无翻译

0

相关内容

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

胶质瘤侵袭过程中DNMT1沉默miR-134与ERK信号通路自激活的表观新机制

国家自然科学基金

0+阅读 · 2015年12月31日

多层积雪相干散射模型与SAR参数反演

国家自然科学基金

0+阅读 · 2014年12月31日

Helmholtz散射问题的谱元法

国家自然科学基金

0+阅读 · 2011年12月31日

原发性肝细胞癌微灌注及弹性模量状态与复发机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于小波有限元的探地雷达正演模拟及偏移处理

国家自然科学基金

0+阅读 · 2008年12月31日

Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions

Arxiv

0+阅读 · 2023年7月10日

Quantization-Aware and Tensor-Compressed Training of Transformers for Natural Language Understanding

Arxiv

0+阅读 · 2023年7月8日

Teaching Arithmetic to Small Transformers

Arxiv

0+阅读 · 2023年7月7日

Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding

Arxiv

12+阅读 · 2021年12月30日

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Arxiv

21+阅读 · 2020年12月17日

VIP会员

文章信息

相关主题

相关VIP内容

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

大模型推理时代的知识编辑

《利用人工智能对军事行动进行建模》

【MIT博士论文】加速科学发现的因果建模实践算法

机器人、无人机与实时影像：应对城市爆炸威胁的三大技术方案

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions

Arxiv

0+阅读 · 2023年7月10日

Quantization-Aware and Tensor-Compressed Training of Transformers for Natural Language Understanding

Arxiv

0+阅读 · 2023年7月8日

Teaching Arithmetic to Small Transformers

Arxiv

0+阅读 · 2023年7月7日

Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding

Arxiv

12+阅读 · 2021年12月30日

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Arxiv

21+阅读 · 2020年12月17日

相关基金

胶质瘤侵袭过程中DNMT1沉默miR-134与ERK信号通路自激活的表观新机制

国家自然科学基金

0+阅读 · 2015年12月31日

多层积雪相干散射模型与SAR参数反演

国家自然科学基金

0+阅读 · 2014年12月31日

Helmholtz散射问题的谱元法

国家自然科学基金

0+阅读 · 2011年12月31日

原发性肝细胞癌微灌注及弹性模量状态与复发机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于小波有限元的探地雷达正演模拟及偏移处理

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员