破碎的神经放大法 (Broken Neural Scaling Laws) - 专知论文

会员服务 ·

0

缩放 · 泛函 · 情景 · MoDELS · 讲稿 ·

2022 年 11 月 10 日

Broken Neural Scaling Laws

翻译：破碎的神经放大法

Ethan Caballero,Kshitij Gupta,Irina Rish,David Krueger

We present a smoothly broken power law functional form that accurately models and extrapolates the scaling behaviors of deep neural networks (i.e. how the evaluation metric of interest varies as the amount of compute used for training, number of model parameters, training dataset size, or upstream performance varies) for each task within a large and diverse set of upstream and downstream tasks, in zero-shot, prompted, and fine-tuned settings. This set includes large-scale vision and unsupervised language tasks, diffusion generative modeling of images, arithmetic, and reinforcement learning. When compared to other functional forms for neural scaling behavior, this functional form yields extrapolations of scaling behavior that are considerably more accurate on this set. Moreover, this functional form accurately models and extrapolates scaling behavior that other functional forms are incapable of expressing such as the non-monotonic transitions present in the scaling behavior of phenomena such as double descent and the delayed, sharp inflection points present in the scaling behavior of tasks such as arithmetic. Lastly, we use this functional form to glean insights about the limit of the predictability of scaling behavior. Code is available at https://github.com/ethancaballero/broken_neural_scaling_laws

翻译：我们展示了一种顺利断裂的权力法功能形式,它精确地模型和外推深神经网络的缩放行为(即,由于用于培训的计算数量、模型参数数量、培训数据集大小或上游业绩不同,评估利息的衡量尺度如何不同),在一系列大而多样的上游和下游任务中,我们以零发、促动和微调的设置的形式,对每项任务都提出了一种顺利的断裂的权力法功能形式。这套形式包括大规模视觉和不受监督的语言任务、图像、算术和强化学习的传播基因化模型。与神经缩放行为的其他功能形式相比,这种功能形式产生对缩放行为的外推法,而该功能形式则相当准确。此外,这种功能形式准确的模型和外推法则无法表达诸如双位和延缓期、尖锐的缩放点等现象在计算等任务的缩放行为中出现的非分子转变。最后,我们使用这种功能形式来对缩放行为可预测性的解读。代码可在 https://giustar/ball_ball_borancanualbal_com

0

相关内容

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

大肠杆菌葡萄糖和木糖高效共利用表达机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

半导体衬底上FeSe薄膜的外延生长及界面超导

国家自然科学基金

0+阅读 · 2013年12月31日

氢醌依赖NF-κB信号转导途径对骨髓间充质干细胞MDR1基因和P-糖蛋白表达的影响

国家自然科学基金

0+阅读 · 2012年12月31日

聚苯胺-聚醚嵌段共聚物对CNT担载Co/Fe纳米粒子的结构调控及其氧还原反应

国家自然科学基金

0+阅读 · 2012年12月31日

黄瓜CsNMAPK和CsPLDα共表达对盐胁迫的响应机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Saccharomyces cerevisiae NJWGYH30566产赤藓糖醇的辅酶工程及调控机理

国家自然科学基金

0+阅读 · 2011年12月31日

假单胞杆菌中对硝基酚代谢途径及降解操纵子的表达调控机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

淫羊藿总黄酮调控骨性关节炎p38MAPK信号转导通路的研究

国家自然科学基金

0+阅读 · 2010年12月31日

转录因子对丹酚酸类化合物生物合成调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

肾康丸对糖尿病肾病大鼠miR-192介导通路的影响

国家自然科学基金

1+阅读 · 2009年12月31日

FireFly: A High-Throughput and Reconfigurable Hardware Accelerator for Spiking Neural Networks

Arxiv

0+阅读 · 2023年1月5日

Verifying Inverse Model Neural Networks

Arxiv

0+阅读 · 2023年1月5日

High-order projection-based upwind method for implicit large eddy simulation

Arxiv

0+阅读 · 2023年1月4日

A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective

Arxiv

21+阅读 · 2022年9月27日

Graph Neural Networks in IoT: A Survey

Arxiv

22+阅读 · 2022年3月31日

Neural Architecture Search without Training

Neural Architecture Search without Training

Arxiv

10+阅读 · 2021年6月11日

Graph Neural Networks for Natural Language Processing: A Survey

Arxiv

36+阅读 · 2021年6月10日

Dynamic Neural Networks: A Survey

Arxiv

37+阅读 · 2021年2月10日

Subgraph Neural Networks

Arxiv

27+阅读 · 2020年6月19日

Simplifying Graph Convolutional Networks

Simplifying Graph Convolutional Networks

Arxiv

12+阅读 · 2019年2月19日

VIP会员

文章信息

相关主题

相关VIP内容

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

相关论文

FireFly: A High-Throughput and Reconfigurable Hardware Accelerator for Spiking Neural Networks

Arxiv

0+阅读 · 2023年1月5日

Verifying Inverse Model Neural Networks

Arxiv

0+阅读 · 2023年1月5日

High-order projection-based upwind method for implicit large eddy simulation

Arxiv

0+阅读 · 2023年1月4日

A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective

Arxiv

21+阅读 · 2022年9月27日

Graph Neural Networks in IoT: A Survey

Arxiv

22+阅读 · 2022年3月31日

Neural Architecture Search without Training

Neural Architecture Search without Training

Arxiv

10+阅读 · 2021年6月11日

Graph Neural Networks for Natural Language Processing: A Survey

Arxiv

36+阅读 · 2021年6月10日

Dynamic Neural Networks: A Survey

Arxiv

37+阅读 · 2021年2月10日

Subgraph Neural Networks

Arxiv

27+阅读 · 2020年6月19日

Simplifying Graph Convolutional Networks

Simplifying Graph Convolutional Networks

Arxiv

12+阅读 · 2019年2月19日

相关基金

大肠杆菌葡萄糖和木糖高效共利用表达机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

半导体衬底上FeSe薄膜的外延生长及界面超导

国家自然科学基金

0+阅读 · 2013年12月31日

氢醌依赖NF-κB信号转导途径对骨髓间充质干细胞MDR1基因和P-糖蛋白表达的影响

国家自然科学基金

0+阅读 · 2012年12月31日

聚苯胺-聚醚嵌段共聚物对CNT担载Co/Fe纳米粒子的结构调控及其氧还原反应

国家自然科学基金

0+阅读 · 2012年12月31日

黄瓜CsNMAPK和CsPLDα共表达对盐胁迫的响应机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Saccharomyces cerevisiae NJWGYH30566产赤藓糖醇的辅酶工程及调控机理

国家自然科学基金

0+阅读 · 2011年12月31日

假单胞杆菌中对硝基酚代谢途径及降解操纵子的表达调控机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

淫羊藿总黄酮调控骨性关节炎p38MAPK信号转导通路的研究

国家自然科学基金

0+阅读 · 2010年12月31日

转录因子对丹酚酸类化合物生物合成调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

肾康丸对糖尿病肾病大鼠miR-192介导通路的影响

国家自然科学基金

1+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员