调适性语言建模 (Confident Adaptive Language Modeling) - 专知论文

会员服务 ·

0

语言模型化 · 可约的 · 置信度 · MoDELS · Analysis ·

2022 年 7 月 14 日

Confident Adaptive Language Modeling

翻译：调适性语言建模

Tal Schuster,Adam Fisch,Jai Gupta,Mostafa Dehghani,Dara Bahri,Vinh Q. Tran,Yi Tay,Donald Metzler

Recent advances in Transformer-based large language models (LLMs) have led to significant performance improvements across many tasks. These gains come with a drastic increase in the models' size, potentially leading to slow and costly use at inference time. In practice, however, the series of generations made by LLMs is composed of varying levels of difficulty. While certain predictions truly benefit from the models' full capacity, other continuations are more trivial and can be solved with reduced compute. In this work, we introduce Confident Adaptive Language Modeling (CALM), a framework for dynamically allocating different amounts of compute per input and generation timestep. Early exit decoding involves several challenges that we address here, such as: (1) what confidence measure to use; (2) connecting sequence-level constraints to local per-token exit decisions; and (3) attending back to missing hidden representations due to early exits in previous tokens. Through theoretical analysis and empirical experiments on three diverse text generation tasks, we demonstrate the efficacy of our framework in reducing compute -- potential speedup of up to $\times 3$ -- while provably maintaining high performance.

翻译：以变异器为基础的大型语言模型(LLMS)最近的进展使许多任务取得了显著的绩效改进。这些成就随着模型规模的大幅扩大而带来显著的绩效改进,可能导致在推论时间使用缓慢和成本高昂。然而,实际上,LLMS的一代人由不同的困难程度组成。虽然某些预测确实受益于模型的全部能力,但其他的延续则更为微不足道,并且可以通过降低计算速度来解决。在这项工作中,我们引入了 " 自信适应语言模型(CALM) " (CALM),这是一个动态分配不同数量计算投入和生成时间间隔的框架。早期退出编码涉及我们在这里讨论的几项挑战,例如:(1) 使用何种信任措施;(2) 将序列级限制与本地的一对一退出决定联系起来;(3) 处理由于先前的早期退出而缺失的隐蔽表述。通过对三种不同文本生成任务的理论分析和实验,我们展示了我们框架在降低计算效率方面的效率 -- 可能加快到3美元的时间 -- 同时保持高性。

0

相关内容

语言模型化

语言模型化

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

elavl1a在斑马鱼心脏发育中的调控机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

IGF1调节脆性X综合症模型小鼠神经元发育及突触可塑性异常的作用及其机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于Raptor码的无线体域网高效信道编码技术

国家自然科学基金

0+阅读 · 2013年12月31日

低频重复经颅磁刺激增强AD工作记忆功能的脑网络连接机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Massive MIMO系统关键技术的研究

国家自然科学基金

0+阅读 · 2012年12月31日

孤独症患者突触发育相关基因与大脑网络连接改变的关联研究

国家自然科学基金

0+阅读 · 2012年12月31日

LncRNA-uc.167致心脏发育畸形的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

突起水平的AD小鼠脑神经结构定量分析研究

国家自然科学基金

0+阅读 · 2011年12月31日

早幼粒细胞白血病锌指基因（PLZF）变异对小鼠骨骼和软骨发育的影响研究

国家自然科学基金

0+阅读 · 2009年12月31日

甲状腺激素与生长激素协调海马神经元生长发育的机制

国家自然科学基金

0+阅读 · 2008年12月31日

Generalized One-shot Domain Adaption of Generative Adversarial Networks

Arxiv

0+阅读 · 2022年9月8日

Unsupervised Domain-adaptive Hash for Networks

Arxiv

0+阅读 · 2022年9月7日

Adapting Pre-trained Language Models to African Languages via Multilingual Adaptive Fine-Tuning

Arxiv

0+阅读 · 2022年9月5日

Mind Your Clever Neighbours: Unsupervised Person Re-identification via Adaptive Clustering Relationship Modeling

Arxiv

13+阅读 · 2021年12月3日

Pix2seq: A Language Modeling Framework for Object Detection

Arxiv

10+阅读 · 2021年9月22日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Arxiv

19+阅读 · 2021年4月19日

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Arxiv

23+阅读 · 2021年3月3日

Adaptive Universal Generalized PageRank Graph Neural Network

Arxiv

10+阅读 · 2021年1月22日

Latent Relation Language Models

Arxiv

21+阅读 · 2019年8月21日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

视觉-语言-动作模型解析：从模块构成到里程碑与挑战

《解析陆域作战方向：一个概念性框架》报告

【博士论文】基于多模态基础模型的上下文学习

追寻真正的AI自主性：从遗留思维到战场优势

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Generalized One-shot Domain Adaption of Generative Adversarial Networks

Arxiv

0+阅读 · 2022年9月8日

Unsupervised Domain-adaptive Hash for Networks

Arxiv

0+阅读 · 2022年9月7日

Adapting Pre-trained Language Models to African Languages via Multilingual Adaptive Fine-Tuning

Arxiv

0+阅读 · 2022年9月5日

Mind Your Clever Neighbours: Unsupervised Person Re-identification via Adaptive Clustering Relationship Modeling

Arxiv

13+阅读 · 2021年12月3日

Pix2seq: A Language Modeling Framework for Object Detection

Arxiv

10+阅读 · 2021年9月22日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Arxiv

19+阅读 · 2021年4月19日

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Arxiv

23+阅读 · 2021年3月3日

Adaptive Universal Generalized PageRank Graph Neural Network

Arxiv

10+阅读 · 2021年1月22日

Latent Relation Language Models

Arxiv

21+阅读 · 2019年8月21日

相关基金

elavl1a在斑马鱼心脏发育中的调控机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

IGF1调节脆性X综合症模型小鼠神经元发育及突触可塑性异常的作用及其机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于Raptor码的无线体域网高效信道编码技术

国家自然科学基金

0+阅读 · 2013年12月31日

低频重复经颅磁刺激增强AD工作记忆功能的脑网络连接机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Massive MIMO系统关键技术的研究

国家自然科学基金

0+阅读 · 2012年12月31日

孤独症患者突触发育相关基因与大脑网络连接改变的关联研究

国家自然科学基金

0+阅读 · 2012年12月31日

LncRNA-uc.167致心脏发育畸形的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

突起水平的AD小鼠脑神经结构定量分析研究

国家自然科学基金

0+阅读 · 2011年12月31日

早幼粒细胞白血病锌指基因（PLZF）变异对小鼠骨骼和软骨发育的影响研究

国家自然科学基金

0+阅读 · 2009年12月31日

甲状腺激素与生长激素协调海马神经元生长发育的机制

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员