反向缩放可变成 U 形状 (Inverse scaling can become U-shaped) - 专知论文

会员服务 ·

0

缩放 · MoDELS · CoT · Performer · Prompt ·

2022 年 11 月 3 日

Inverse scaling can become U-shaped

翻译：反向缩放可变成 U 形状

Jason Wei,Yi Tay,Quoc V. Le

Although scaling language models improves performance on a range of tasks, there are apparently some scenarios where scaling hurts performance. For instance, the Inverse Scaling Prize Round 1 identified four ''inverse scaling'' tasks, for which performance gets worse for larger models. These tasks were evaluated on models of up to 280B parameters, trained up to 500 zettaFLOPs of compute. This paper takes a closer look at these four tasks. We evaluate models of up to 540B parameters, trained on five times more compute than those evaluated in the Inverse Scaling Prize. With this increased range of model sizes and training compute, three out of the four tasks exhibit what we call ''U-shaped scaling'' -- performance decreases up to a certain model size, and then increases again up to the largest model evaluated. One hypothesis is that U-shaped scaling occurs when a task comprises a ''true task'' and a ''distractor task''. Medium-size models can do the distractor task, which hurts performance, while only large-enough models can ignore the distractor task and do the true task. The existence of U-shaped scaling implies that inverse scaling may not hold for larger models. Second, we evaluate the inverse scaling tasks using chain-of-thought (CoT) prompting, in addition to basic prompting without CoT. With CoT prompting, all four tasks show either U-shaped scaling or positive scaling, achieving perfect solve rates on two tasks and several sub-tasks. This suggests that the term "inverse scaling task" is under-specified -- a given task may be inverse scaling for one prompt but positive or U-shaped scaling for a different prompt.

翻译：虽然在一系列任务上推广语言模型会提高绩效,但显然有些情况是,在一系列任务上,扩展语言模型会提高绩效。例如,反向扩展奖第一回合确定了四个“反向缩放”任务,其中四个“反向缩放”任务比较大的模型要差得多。这些任务在最多280B参数的模型上进行了评估,训练了最多500 zettaFLOP 的计算。本文对这四个任务进行了更仔细的审视。我们评估了最多540B参数的模型,比反向缩放奖中评估的参数要高五倍的计算。随着模型规模和培训的计算范围扩大,四个任务中的三个显示我们所谓的“U型缩放”任务中的四个“反向缩放”任务中的四个“反向反向缩放”任务中的四个。随着U型任务的规模扩大,在二进式任务中,“向更大幅度的缩放”的U型任务的存在意味着我们不会保持“真实的缩放速度。

0

相关内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

专知会员服务

97+阅读 · 2020年4月10日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

高应变率下超轻高强水泥基复合材料的多尺度研究：材料破坏、本构、纤维与基体的相互作用

国家自然科学基金

0+阅读 · 2015年12月31日

生物钟对植物叶片衰老的影响

国家自然科学基金

0+阅读 · 2015年12月31日

羰基化合物激发态势能面交叉动力学的共振拉曼光谱和CASSCF计算研究

国家自然科学基金

0+阅读 · 2014年12月31日

Kahler 曲面中特殊曲面的研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于单矿物流变方程的下地壳多矿物岩石流变性质的数值模拟

国家自然科学基金

0+阅读 · 2013年12月31日

高强度金属材料的形变与破坏机理

国家自然科学基金

0+阅读 · 2013年12月31日

应力对FeRh薄膜磁卡效应的调控研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

颗粒状岩土材料在临界状态下的剪切机理与抗剪强度研究

国家自然科学基金

0+阅读 · 2012年12月31日

水轮机旋转湍流全欧拉并行多层网格模拟研究

国家自然科学基金

0+阅读 · 2012年12月31日

The choice of scaling technique matters for classification performance

Arxiv

0+阅读 · 2022年12月23日

Callee: Recovering Call Graphs for Binaries with Transfer and Contrastive Learning

Arxiv

0+阅读 · 2022年12月23日

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

Arxiv

0+阅读 · 2022年12月22日

Prompt-Augmented Linear Probing: Scaling Beyond The Limit of Few-shot In-Context Learners

Arxiv

0+阅读 · 2022年12月21日

Understanding Stereotypes in Language Models: Towards Robust Measurement and Zero-Shot Debiasing

Arxiv

0+阅读 · 2022年12月20日

An order-theoretic perspective on modes and maximum a posteriori estimation in Bayesian inverse problems

Arxiv

0+阅读 · 2022年12月20日

Can large language models reason about medical questions?

Arxiv

0+阅读 · 2022年12月20日

Tokenization Consistency Matters for Generative Models on Extractive NLP Tasks

Arxiv

0+阅读 · 2022年12月19日

Conditional Prompt Learning for Vision-Language Models

Conditional Prompt Learning for Vision-Language Models

Arxiv

13+阅读 · 2022年3月10日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

专知会员服务

97+阅读 · 2020年4月10日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

卫星导航技术发展综述

《美军"僚机"联合能力技术演示项目：有人-无人火炮作战》41页报告

美军条令《火力指挥》116页

可解释的人工智能在生物医学图像分析中的应用综述

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

The choice of scaling technique matters for classification performance

Arxiv

0+阅读 · 2022年12月23日

Callee: Recovering Call Graphs for Binaries with Transfer and Contrastive Learning

Arxiv

0+阅读 · 2022年12月23日

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

Arxiv

0+阅读 · 2022年12月22日

Prompt-Augmented Linear Probing: Scaling Beyond The Limit of Few-shot In-Context Learners

Arxiv

0+阅读 · 2022年12月21日

Understanding Stereotypes in Language Models: Towards Robust Measurement and Zero-Shot Debiasing

Arxiv

0+阅读 · 2022年12月20日

An order-theoretic perspective on modes and maximum a posteriori estimation in Bayesian inverse problems

Arxiv

0+阅读 · 2022年12月20日

Can large language models reason about medical questions?

Arxiv

0+阅读 · 2022年12月20日

Tokenization Consistency Matters for Generative Models on Extractive NLP Tasks

Arxiv

0+阅读 · 2022年12月19日

Conditional Prompt Learning for Vision-Language Models

Conditional Prompt Learning for Vision-Language Models

Arxiv

13+阅读 · 2022年3月10日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

相关基金

高应变率下超轻高强水泥基复合材料的多尺度研究：材料破坏、本构、纤维与基体的相互作用

国家自然科学基金

0+阅读 · 2015年12月31日

生物钟对植物叶片衰老的影响

国家自然科学基金

0+阅读 · 2015年12月31日

羰基化合物激发态势能面交叉动力学的共振拉曼光谱和CASSCF计算研究

国家自然科学基金

0+阅读 · 2014年12月31日

Kahler 曲面中特殊曲面的研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于单矿物流变方程的下地壳多矿物岩石流变性质的数值模拟

国家自然科学基金

0+阅读 · 2013年12月31日

高强度金属材料的形变与破坏机理

国家自然科学基金

0+阅读 · 2013年12月31日

应力对FeRh薄膜磁卡效应的调控研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

颗粒状岩土材料在临界状态下的剪切机理与抗剪强度研究

国家自然科学基金

0+阅读 · 2012年12月31日

水轮机旋转湍流全欧拉并行多层网格模拟研究

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员