调适性语言建模 (Confident Adaptive Language Modeling) - 专知论文

会员服务 ·

0

语言模型化 · 置信度 · 可约的 · MoDELS · Continuity ·

2022 年 10 月 25 日

Confident Adaptive Language Modeling

翻译：调适性语言建模

Tal Schuster,Adam Fisch,Jai Gupta,Mostafa Dehghani,Dara Bahri,Vinh Q. Tran,Yi Tay,Donald Metzler

from arxiv, NeurIPS 2022 (selected as Oral)

Recent advances in Transformer-based large language models (LLMs) have led to significant performance improvements across many tasks. These gains come with a drastic increase in the models' size, potentially leading to slow and costly use at inference time. In practice, however, the series of generations made by LLMs is composed of varying levels of difficulty. While certain predictions truly benefit from the models' full capacity, other continuations are more trivial and can be solved with reduced compute. In this work, we introduce Confident Adaptive Language Modeling (CALM), a framework for dynamically allocating different amounts of compute per input and generation timestep. Early exit decoding involves several challenges that we address here, such as: (1) what confidence measure to use; (2) connecting sequence-level constraints to local per-token exit decisions; and (3) attending back to missing hidden representations due to early exits in previous tokens. Through theoretical analysis and empirical experiments on three diverse text generation tasks, we demonstrate the efficacy of our framework in reducing compute -- potential speedup of up to $\times 3$ -- while provably maintaining high performance.

翻译：以变异器为基础的大型语言模型(LLMS)最近的进展使许多任务取得了显著的绩效改进。这些成就随着模型规模的大幅扩大而带来显著的绩效改进,可能导致在推论时间使用缓慢和成本高昂。然而,实际上,LLMS的一代人由不同的困难程度组成。虽然某些预测确实受益于模型的全部能力,但其他的延续则更为微不足道,并且可以通过降低计算速度来解决。在这项工作中,我们引入了 " 自信适应语言模型(CALM) " (CALM),这是一个动态分配不同数量计算投入和生成时间间隔的框架。早期退出编码涉及我们在这里讨论的几项挑战,例如:(1) 使用何种信任措施;(2) 将序列级限制与本地的一对一退出决定联系起来;(3) 处理由于先前的早期退出而缺失的隐蔽表述。通过对三种不同文本生成任务的理论分析和实验,我们展示了我们框架在降低计算效率方面的效率 -- 可能加快到3美元的时间 -- 同时保持高性。

0

相关内容

语言模型化

语言模型化

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

Copine VII在阿尔茨海默病中的作用机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

新疆酿酒葡萄紫檀茋防御酸腐病的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

移动Ad Hoc网络动态信任管理机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

Partial Spread Bent函数与Bent-Negabent函数的构造及密码学性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

结合多模态影像技术观察VDAC1介导氢气对CA-ROSC兔脑能量代谢的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

NG2细胞在未成熟脑惊厥性脑损伤神经环路形成中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

ING3：原发性肝癌的诊断与治疗新靶点

国家自然科学基金

0+阅读 · 2012年12月31日

miR-1/miR-143调控长链基因间非编码RNA TUG1抑制膀胱癌进展的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

水稻ERF转录因子家族IX亚组成员在抗病性中的功能及其作用机制

国家自然科学基金

1+阅读 · 2012年12月31日

PTBP1介导的survivinΔEx3过表达调控胶质母细胞瘤微血管增生的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Adaptive analysis-aware defeaturing

Arxiv

0+阅读 · 2022年12月10日

Self-training via Metric Learning for Source-Free Domain Adaptation of Semantic Segmentation

Arxiv

0+阅读 · 2022年12月8日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Arxiv

19+阅读 · 2021年4月19日

Graph Contrastive Learning with Adaptive Augmentation

Arxiv

10+阅读 · 2021年2月26日

Adaptive Universal Generalized PageRank Graph Neural Network

Arxiv

10+阅读 · 2021年1月22日

Adaptive Attentional Network for Few-Shot Knowledge Graph Completion

Arxiv

17+阅读 · 2020年10月19日

AdarGCN: Adaptive Aggregation GCN for Few-Shot Learning

AdarGCN: Adaptive Aggregation GCN for Few-Shot Learning

Arxiv

64+阅读 · 2020年2月28日

K-BERT: Enabling Language Representation with Knowledge Graph

K-BERT: Enabling Language Representation with Knowledge Graph

Arxiv

19+阅读 · 2019年9月17日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】基础模型训练中网络规模数据的负责任与高效使用

《俄乌战争背景下俄罗斯的战略性海军分析（2022-2025年）》最新100页报告

人工智能时代背景下的未来海战

相关资讯

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

相关论文

Adaptive analysis-aware defeaturing

Arxiv

0+阅读 · 2022年12月10日

Self-training via Metric Learning for Source-Free Domain Adaptation of Semantic Segmentation

Arxiv

0+阅读 · 2022年12月8日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation

Arxiv

19+阅读 · 2021年4月19日

Graph Contrastive Learning with Adaptive Augmentation

Arxiv

10+阅读 · 2021年2月26日

Adaptive Universal Generalized PageRank Graph Neural Network

Arxiv

10+阅读 · 2021年1月22日

Adaptive Attentional Network for Few-Shot Knowledge Graph Completion

Arxiv

17+阅读 · 2020年10月19日

AdarGCN: Adaptive Aggregation GCN for Few-Shot Learning

AdarGCN: Adaptive Aggregation GCN for Few-Shot Learning

Arxiv

64+阅读 · 2020年2月28日

K-BERT: Enabling Language Representation with Knowledge Graph

K-BERT: Enabling Language Representation with Knowledge Graph

Arxiv

19+阅读 · 2019年9月17日

相关基金

Copine VII在阿尔茨海默病中的作用机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

新疆酿酒葡萄紫檀茋防御酸腐病的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

移动Ad Hoc网络动态信任管理机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

Partial Spread Bent函数与Bent-Negabent函数的构造及密码学性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

结合多模态影像技术观察VDAC1介导氢气对CA-ROSC兔脑能量代谢的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

NG2细胞在未成熟脑惊厥性脑损伤神经环路形成中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

ING3：原发性肝癌的诊断与治疗新靶点

国家自然科学基金

0+阅读 · 2012年12月31日

miR-1/miR-143调控长链基因间非编码RNA TUG1抑制膀胱癌进展的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

水稻ERF转录因子家族IX亚组成员在抗病性中的功能及其作用机制

国家自然科学基金

1+阅读 · 2012年12月31日

PTBP1介导的survivinΔEx3过表达调控胶质母细胞瘤微血管增生的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员