Unlimiformer: 一种能够处理无限长度输入的长程Transformer (Unlimiformer: Long-Range Transformers with Unlimited Length Input) - 专知论文

会员服务 ·

0

变换 · 无限 · 解码 · 多文档摘要 · 文档摘要 ·

2023 年 5 月 2 日

Unlimiformer: Long-Range Transformers with Unlimited Length Input

翻译：Unlimiformer: 一种能够处理无限长度输入的长程Transformer

Amanda Bertsch,Uri Alon,Graham Neubig,Matthew R. Gormley

from arxiv, Preprint

Transformer-based models typically have a predefined bound to their input length, because of their need to potentially attend to every token in the input. In this work, we propose Unlimiformer: a general approach that can wrap any existing pretrained encoder-decoder transformer, and offload the attention computation across all layers to a single $k$-nearest-neighbor index; this index can be kept on either the GPU or CPU memory and queried in sub-linear time. This way, we can index extremely long input sequences, while every attention head in every decoder layer retrieves its top-$k$ keys, instead of attending to every key. We demonstrate Unlimiformers's efficacy on several long-document and multi-document summarization benchmarks, showing that it can summarize even 350k token-long inputs from the BookSum dataset, without any input truncation at test time. Unlimiformer improves pretrained models such as BART and Longformer by extending them to unlimited inputs without additional learned weights and without modifying their code. We make our code and models publicly available at https://github.com/abertsch72/unlimiformer .

翻译：Transformer-based模型通常会对它们的输入长度有一个预定义的限制，因为它们需要考虑到输入中的每个标记。在这项工作中，我们提出了Unlimiformer：一种通用方法，可以包装任何现有的预先训练的编码器-解码器Transformer，并将注意力计算跨越所有层分配给单个 $k$-nearest-neighbor索引；该索引可以保留在GPU或CPU内存上并在次线性时间内查询。这样，我们就可以索引极长的输入序列，而每个解码器层中的每个注意力头都检索其前 $k$ 个键，而不是考虑每个键。我们在几个长文档和多文档摘要基准测试中展示了Unlimiformer的效果，表明它可以总结甚至来自BookSum 数据集的长度为350k个标记的输入，而不在测试时截断输入。Unlimiformer通过扩展它们以处理无限输入而无需增加学习的权重或修改它们的代码来改进了预先训练模型，例如BART和Longformer。我们在https://github.com/abertsch72/unlimiformer上公开了我们的代码和模型。

1

相关内容

【ACL2022教程】有限文本数据学习，Learning with Limited Text Data

【ACL2022教程】有限文本数据学习，Learning with Limited Text Data

专知会员服务

29+阅读 · 2022年5月22日

【ACL2022】解释生成的多尺度分布深度变分自编码器, Multi-Scale Distribution Deep Variational Autoencoder for Explanation Generation

【ACL2022】解释生成的多尺度分布深度变分自编码器, Multi-Scale Distribution Deep Variational Autoencoder for Explanation Generation

专知会员服务

12+阅读 · 2022年3月24日

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

专知会员服务

28+阅读 · 2022年3月3日

【ICML2021】SparseBERT: 自注意力机制的重要性分析再思考

专知会员服务

37+阅读 · 2021年5月15日

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

专知会员服务

34+阅读 · 2020年6月19日

【斯坦福大学AI】BERT, ELMo， & GPT-2:上下文化的单词表示是怎样的?

专知会员服务

35+阅读 · 2020年3月28日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

无所不能的Self-Attention！洛桑理工ICLR2020论文验证「自注意力可以表达任何CNN卷积滤波层」

无所不能的Self-Attention！洛桑理工ICLR2020论文验证「自注意力可以表达任何CNN卷积滤波层」

专知会员服务

57+阅读 · 2020年1月12日

【清华大学】Bert 简介，Bidirectional Encoder Representations from Transformers，21页ppt

【清华大学】Bert 简介，Bidirectional Encoder Representations from Transformers，21页ppt

专知会员服务

79+阅读 · 2019年12月29日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

一文带你浏览Graph Transformers

一文带你浏览Graph Transformers

PaperWeekly

1+阅读 · 2022年7月8日

【Google】微型化机器学习教程，17页ppt，Getting Started with TinyML

【Google】微型化机器学习教程，17页ppt，Getting Started with TinyML

专知

10+阅读 · 2020年3月28日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

上百种预训练中文词向量：Chinese-Word-Vectors

上百种预训练中文词向量：Chinese-Word-Vectors

AINLP

23+阅读 · 2019年2月26日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

专知

29+阅读 · 2018年3月12日

TensorFlow seq2seq中的Attention机制（续）

TensorFlow seq2seq中的Attention机制（续）

深度学习每日摘要

15+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】全卷积语义分割综述

【推荐】全卷积语义分割综述

机器学习研究会

19+阅读 · 2017年8月31日

有理映射的参数空间

国家自然科学基金

0+阅读 · 2013年12月31日

某些非线性波方程的解性态的研究

国家自然科学基金

0+阅读 · 2013年12月31日

Hippo通路在急性肾损伤发病中的作用及其机制

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

Eulerian bond-cubic 模型渗流性质的数值研究

国家自然科学基金

0+阅读 · 2012年12月31日

Dicer在慢性乙型病毒性肝炎恶性转化过程中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

随机变分不等式

国家自然科学基金

0+阅读 · 2011年12月31日

Rayleigh信道统计分析和建模

国家自然科学基金

0+阅读 · 2009年12月31日

遍历哈密顿系统的谱理论

国家自然科学基金

0+阅读 · 2009年12月31日

无线传感器网络中基于时间序列相关性的低能耗数据获取方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

Label-noise-tolerant medical image classification via self-attention and self-supervised learning

Arxiv

0+阅读 · 2023年6月16日

Your Email Address Holds the Key: Understanding the Connection Between Email and Password Security with Deep Learning

Arxiv

0+阅读 · 2023年6月16日

CAM++: A Fast and Efficient Network for Speaker Verification Using Context-Aware Masking

Arxiv

0+阅读 · 2023年6月16日

Temporal Causal Mediation through a Point Process: Direct and Indirect Effects of Healthcare Interventions

Arxiv

0+阅读 · 2023年6月16日

Finite state verifiers with both private and public coins

Arxiv

0+阅读 · 2023年6月15日

Graph Ordering Attention Networks

Arxiv

12+阅读 · 2022年11月21日

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Arxiv

21+阅读 · 2020年12月17日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling

Arxiv

16+阅读 · 2018年1月31日

Deep Semantic Role Labeling with Self-Attention

Arxiv

13+阅读 · 2017年12月5日

VIP会员

文章信息

相关主题

多文档摘要

相关VIP内容

【ACL2022教程】有限文本数据学习，Learning with Limited Text Data

【ACL2022教程】有限文本数据学习，Learning with Limited Text Data

专知会员服务

29+阅读 · 2022年5月22日

【ACL2022】解释生成的多尺度分布深度变分自编码器, Multi-Scale Distribution Deep Variational Autoencoder for Explanation Generation

【ACL2022】解释生成的多尺度分布深度变分自编码器, Multi-Scale Distribution Deep Variational Autoencoder for Explanation Generation

专知会员服务

12+阅读 · 2022年3月24日

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

专知会员服务

28+阅读 · 2022年3月3日

【ICML2021】SparseBERT: 自注意力机制的重要性分析再思考

专知会员服务

37+阅读 · 2021年5月15日

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

专知会员服务

34+阅读 · 2020年6月19日

【斯坦福大学AI】BERT, ELMo， & GPT-2:上下文化的单词表示是怎样的?

专知会员服务

35+阅读 · 2020年3月28日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

无所不能的Self-Attention！洛桑理工ICLR2020论文验证「自注意力可以表达任何CNN卷积滤波层」

无所不能的Self-Attention！洛桑理工ICLR2020论文验证「自注意力可以表达任何CNN卷积滤波层」

专知会员服务

57+阅读 · 2020年1月12日

【清华大学】Bert 简介，Bidirectional Encoder Representations from Transformers，21页ppt

【清华大学】Bert 简介，Bidirectional Encoder Representations from Transformers，21页ppt

专知会员服务

79+阅读 · 2019年12月29日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】迈向具有高维结果的可靠且稳健的因果推断

《美海军分布式海上作战（DMO）概念：最新情况》

Gemini 2.5：推动前沿，具备先进推理、多模态、长上下文及下一代智能体能力

【ICML2025教程】联想记忆的现代方法

相关资讯

一文带你浏览Graph Transformers

一文带你浏览Graph Transformers

PaperWeekly

1+阅读 · 2022年7月8日

【Google】微型化机器学习教程，17页ppt，Getting Started with TinyML

【Google】微型化机器学习教程，17页ppt，Getting Started with TinyML

专知

10+阅读 · 2020年3月28日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

上百种预训练中文词向量：Chinese-Word-Vectors

上百种预训练中文词向量：Chinese-Word-Vectors

AINLP

23+阅读 · 2019年2月26日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

专知

29+阅读 · 2018年3月12日

TensorFlow seq2seq中的Attention机制（续）

TensorFlow seq2seq中的Attention机制（续）

深度学习每日摘要

15+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】全卷积语义分割综述

【推荐】全卷积语义分割综述

机器学习研究会

19+阅读 · 2017年8月31日

相关论文

Label-noise-tolerant medical image classification via self-attention and self-supervised learning

Arxiv

0+阅读 · 2023年6月16日

Your Email Address Holds the Key: Understanding the Connection Between Email and Password Security with Deep Learning

Arxiv

0+阅读 · 2023年6月16日

CAM++: A Fast and Efficient Network for Speaker Verification Using Context-Aware Masking

Arxiv

0+阅读 · 2023年6月16日

Temporal Causal Mediation through a Point Process: Direct and Indirect Effects of Healthcare Interventions

Arxiv

0+阅读 · 2023年6月16日

Finite state verifiers with both private and public coins

Arxiv

0+阅读 · 2023年6月15日

Graph Ordering Attention Networks

Arxiv

12+阅读 · 2022年11月21日

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Arxiv

21+阅读 · 2020年12月17日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling

Arxiv

16+阅读 · 2018年1月31日

Deep Semantic Role Labeling with Self-Attention

Arxiv

13+阅读 · 2017年12月5日

相关基金

有理映射的参数空间

国家自然科学基金

0+阅读 · 2013年12月31日

某些非线性波方程的解性态的研究

国家自然科学基金

0+阅读 · 2013年12月31日

Hippo通路在急性肾损伤发病中的作用及其机制

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

Eulerian bond-cubic 模型渗流性质的数值研究

国家自然科学基金

0+阅读 · 2012年12月31日

Dicer在慢性乙型病毒性肝炎恶性转化过程中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

随机变分不等式

国家自然科学基金

0+阅读 · 2011年12月31日

Rayleigh信道统计分析和建模

国家自然科学基金

0+阅读 · 2009年12月31日

遍历哈密顿系统的谱理论

国家自然科学基金

0+阅读 · 2009年12月31日

无线传感器网络中基于时间序列相关性的低能耗数据获取方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员