Trellis：学习压缩注意力模型中的键值记忆 (Trellis: Learning to Compress Key-Value Memory in Attention Models) - 专知论文

会员服务 ·

0

注意力模型 · 序列 · 上下文 · Transformer · 内存 ·

2025 年 12 月 29 日

Trellis: Learning to Compress Key-Value Memory in Attention Models

翻译：Trellis：学习压缩注意力模型中的键值记忆

Mahdi Karami,Ali Behrouz,Praneeth Kacham,Vahab Mirrokni

from arxiv, In Second Conference on Language Modeling (COLM) (2025)

Transformers, while powerful, suffer from quadratic computational complexity and the ever-growing Key-Value (KV) cache of the attention mechanism. This paper introduces Trellis, a novel Transformer architecture with bounded memory that learns how to compress its key-value memory dynamically at test time. Trellis replaces the standard KV cache with a fixed-size memory and train a two-pass recurrent compression mechanism to store new keys and values into memory. To achieve this, it leverages an online gradient descent procedure with a forget gate, enabling the compressed memory to be updated recursively while learning to retain important contextual information from incoming tokens at test time. Extensive experiments on language modeling, common-sense reasoning, recall-intensive tasks, and time series show that the proposed architecture outperforms strong baselines. Notably, its performance gains increase as the sequence length grows, highlighting its potential for long-context applications.

翻译：Transformer模型虽然强大，但存在二次计算复杂度和注意力机制中不断增长的键值（KV）缓存问题。本文提出Trellis，一种具有有限内存的新型Transformer架构，能够在测试时动态学习压缩其键值记忆。Trellis用固定大小的记忆体替代标准KV缓存，并训练一个双通道循环压缩机制，将新键值存储到记忆体中。为实现这一目标，它采用带有遗忘门的在线梯度下降过程，使压缩记忆能够递归更新，同时学习在测试时保留来自输入标记的重要上下文信息。在语言建模、常识推理、记忆密集型任务和时间序列上的大量实验表明，所提出的架构优于强基线模型。值得注意的是，其性能增益随序列长度增加而提升，突显了其在长上下文应用中的潜力。

0

相关内容

注意力模型

注意力模型

MM-REACT:提示ChatGPT进行多模态推理和行动

MM-REACT:提示ChatGPT进行多模态推理和行动

专知会员服务

34+阅读 · 2023年3月26日

《用于代码弱点识别的 LLVM 中间表示》CMU

《用于代码弱点识别的 LLVM 中间表示》CMU

专知会员服务

14+阅读 · 2022年12月12日

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

【DeepMind】PolyGen: 一种三维网格的自回归生成模型，PolyGen: An Autoregressive Generative Model of 3D Meshes

【DeepMind】PolyGen: 一种三维网格的自回归生成模型，PolyGen: An Autoregressive Generative Model of 3D Meshes

专知会员服务

37+阅读 · 2020年2月27日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

ICLR'21 | GNN联邦学习的新基准

ICLR'21 | GNN联邦学习的新基准

图与推荐

12+阅读 · 2021年11月15日

字节跳动李航提出AMBERT！超越BERT！多粒度token预训练语言模型

字节跳动李航提出AMBERT！超越BERT！多粒度token预训练语言模型

专知

18+阅读 · 2020年8月31日

论文浅尝 | GEOM-GCN: Geometric Graph Convolutional Networks

论文浅尝 | GEOM-GCN: Geometric Graph Convolutional Networks

开放知识图谱

14+阅读 · 2020年4月8日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知

245+阅读 · 2019年11月18日

语义Web知识库补全关键技术研究

国家自然科学基金

17+阅读 · 2017年12月31日

SDN数据平面中大规模流表的高性能查找方法研究

国家自然科学基金

4+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

基于决策模型和预备电位的运动想象BCI研究

国家自然科学基金

3+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

RxnBench: A Multimodal Benchmark for Evaluating Large Language Models on Chemical Reaction Understanding from Scientific Literature

Arxiv

0+阅读 · 2025年12月30日

MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining

Arxiv

0+阅读 · 2025年12月30日

MDToC: Metacognitive Dynamic Tree of Concepts for Boosting Mathematical Problem-Solving of Large Language Models

Arxiv

0+阅读 · 2025年12月29日

UniPR-3D: Towards Universal Visual Place Recognition with Visual Geometry Grounded Transformer

Arxiv

0+阅读 · 2025年12月28日

Agent2World: Learning to Generate Symbolic World Models via Adaptive Multi-Agent Feedback

Arxiv

0+阅读 · 2025年12月26日

VIP会员

文章信息

相关主题

注意力模型

相关VIP内容

MM-REACT:提示ChatGPT进行多模态推理和行动

MM-REACT:提示ChatGPT进行多模态推理和行动

专知会员服务

34+阅读 · 2023年3月26日

《用于代码弱点识别的 LLVM 中间表示》CMU

《用于代码弱点识别的 LLVM 中间表示》CMU

专知会员服务

14+阅读 · 2022年12月12日

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

【DeepMind】PolyGen: 一种三维网格的自回归生成模型，PolyGen: An Autoregressive Generative Model of 3D Meshes

【DeepMind】PolyGen: 一种三维网格的自回归生成模型，PolyGen: An Autoregressive Generative Model of 3D Meshes

专知会员服务

37+阅读 · 2020年2月27日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

热门VIP内容

开通专知VIP会员享更多权益服务

生成式人工智能导论：可靠性、负责任开发及实际应用（第二版）

《2025财年美陆军转型倡议（ATI）部队结构与组织提案》

【CMU博士论文】分布偏移下的可信机器学习

智能体 EDA 的曙光：自主数字芯片设计综述

相关资讯

ICLR'21 | GNN联邦学习的新基准

ICLR'21 | GNN联邦学习的新基准

图与推荐

12+阅读 · 2021年11月15日

字节跳动李航提出AMBERT！超越BERT！多粒度token预训练语言模型

字节跳动李航提出AMBERT！超越BERT！多粒度token预训练语言模型

专知

18+阅读 · 2020年8月31日

论文浅尝 | GEOM-GCN: Geometric Graph Convolutional Networks

论文浅尝 | GEOM-GCN: Geometric Graph Convolutional Networks

开放知识图谱

14+阅读 · 2020年4月8日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知

245+阅读 · 2019年11月18日

相关论文

RxnBench: A Multimodal Benchmark for Evaluating Large Language Models on Chemical Reaction Understanding from Scientific Literature

Arxiv

0+阅读 · 2025年12月30日

MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining

Arxiv

0+阅读 · 2025年12月30日

MDToC: Metacognitive Dynamic Tree of Concepts for Boosting Mathematical Problem-Solving of Large Language Models

Arxiv

0+阅读 · 2025年12月29日

UniPR-3D: Towards Universal Visual Place Recognition with Visual Geometry Grounded Transformer

Arxiv

0+阅读 · 2025年12月28日

Agent2World: Learning to Generate Symbolic World Models via Adaptive Multi-Agent Feedback

Arxiv

0+阅读 · 2025年12月26日

相关基金

语义Web知识库补全关键技术研究

国家自然科学基金

17+阅读 · 2017年12月31日

SDN数据平面中大规模流表的高性能查找方法研究

国家自然科学基金

4+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

基于决策模型和预备电位的运动想象BCI研究

国家自然科学基金

3+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员