低精度训练的开箱即用单元缩放方法 (Unit Scaling: Out-of-the-Box Low-Precision Training) - 专知论文

会员服务 ·

0

缩放 · MoDELS · 值域 · 单位方差 · Principle ·

2023 年 3 月 20 日

Unit Scaling: Out-of-the-Box Low-Precision Training

翻译：低精度训练的开箱即用单元缩放方法

Charlie Blake,Douglas Orr,Carlo Luschi

from arxiv, 29 pages, 11 figures

We present unit scaling, a paradigm for designing deep learning models that simplifies the use of low-precision number formats. Training in FP16 or the recently proposed FP8 formats offers substantial efficiency gains, but can lack sufficient range for out-of-the-box training. Unit scaling addresses this by introducing a principled approach to model numerics: seeking unit variance of all weights, activations and gradients at initialisation. Unlike alternative methods, this approach neither requires multiple training runs to find a suitable scale nor has significant computational overhead. We demonstrate the efficacy of unit scaling across a range of models and optimisers. We further show that existing models can be adapted to be unit-scaled, training BERT-Large in FP16 and then FP8 with no degradation in accuracy.

翻译：我们提出了单元缩放方法，这是一种用于设计深度学习模型的范式，可以简化低精度数值格式的使用。在FP16或最近提出的FP8格式中进行训练可以带来实质性的效率提升，但可能缺乏足够的训练范围。单元缩放通过引入一种原则性方法来解决这个问题：在初始化时寻求所有权重、激活和梯度的单位方差。与替代方法不同，这种方法既不需要多次运行找到合适的尺度，也没有显著的计算开销。我们在各种模型和优化器上展示了单元缩放的功效。我们进一步展示，现有模型可以适应为单元缩放模型，使用FP16训练BERT-Large，然后使用FP8进行训练，精度没有降低。

0

相关内容

ICLR | 训练面向分子模拟的十亿级参数 GNN

ICLR | 训练面向分子模拟的十亿级参数 GNN

专知会员服务

8+阅读 · 2022年6月27日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

专知会员服务

14+阅读 · 2020年5月19日

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

专知会员服务

26+阅读 · 2020年3月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【泡泡一分钟】优化对比度增强以提高SLAM重定位环境中视觉跟踪的稳健性

【泡泡一分钟】优化对比度增强以提高SLAM重定位环境中视觉跟踪的稳健性

泡泡机器人SLAM

10+阅读 · 2019年4月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

11+阅读 · 2018年12月4日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

利用贝叶斯方法估计LAMOST恒星参数

国家自然科学基金

2+阅读 · 2015年12月31日

特殊环境下大型构件全位置焊接移动机器人自主定位方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

机器翻译中大规模异类特征的迁移学习

国家自然科学基金

2+阅读 · 2013年12月31日

大变形结构无网格拓扑优化方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

microRNA-194通过组蛋白修饰对角膜内皮早衰的调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

高速高精度电主轴温升预测及其主动控制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于超高分辨率视频的HEVC低复杂度模型和方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

大型特殊矩阵的降阶和特征值分布

国家自然科学基金

0+阅读 · 2009年12月31日

具有靶向性的温度和pH双敏感共聚物-紫杉醇键合药的合成

国家自然科学基金

0+阅读 · 2009年12月31日

基于分布式水文模型的流域尺度土壤湿度遥感数据同化研究

国家自然科学基金

0+阅读 · 2009年12月31日

An Efficient Transformer Decoder with Compressed Sub-layers

Arxiv

0+阅读 · 2023年5月11日

Access-Redundancy Tradeoffs in Quantized Linear Computations

Arxiv

0+阅读 · 2023年5月10日

What is mature and what is still emerging in the cryptocurrency market?

Arxiv

0+阅读 · 2023年5月9日

On-device Training: A First Overview on Existing Systems

Arxiv

1+阅读 · 2023年5月9日

Efficient and Degree-Guided Graph Generation via Discrete Diffusion Modeling

Arxiv

5+阅读 · 2023年5月9日

Pretraining Without Attention

Arxiv

0+阅读 · 2023年5月9日

Vcc: Scaling Transformers to 128K Tokens or More by Prioritizing Important Tokens

Arxiv

0+阅读 · 2023年5月7日

On the Usage of Continual Learning for Out-of-Distribution Generalization in Pre-trained Language Models of Code

Arxiv

0+阅读 · 2023年5月6日

A Survey of Quantization Methods for Efficient Neural Network Inference

Arxiv

22+阅读 · 2021年6月21日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR | 训练面向分子模拟的十亿级参数 GNN

ICLR | 训练面向分子模拟的十亿级参数 GNN

专知会员服务

8+阅读 · 2022年6月27日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

专知会员服务

14+阅读 · 2020年5月19日

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

专知会员服务

26+阅读 · 2020年3月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【泡泡一分钟】优化对比度增强以提高SLAM重定位环境中视觉跟踪的稳健性

【泡泡一分钟】优化对比度增强以提高SLAM重定位环境中视觉跟踪的稳健性

泡泡机器人SLAM

10+阅读 · 2019年4月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

11+阅读 · 2018年12月4日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

相关论文

An Efficient Transformer Decoder with Compressed Sub-layers

Arxiv

0+阅读 · 2023年5月11日

Access-Redundancy Tradeoffs in Quantized Linear Computations

Arxiv

0+阅读 · 2023年5月10日

What is mature and what is still emerging in the cryptocurrency market?

Arxiv

0+阅读 · 2023年5月9日

On-device Training: A First Overview on Existing Systems

Arxiv

1+阅读 · 2023年5月9日

Efficient and Degree-Guided Graph Generation via Discrete Diffusion Modeling

Arxiv

5+阅读 · 2023年5月9日

Pretraining Without Attention

Arxiv

0+阅读 · 2023年5月9日

Vcc: Scaling Transformers to 128K Tokens or More by Prioritizing Important Tokens

Arxiv

0+阅读 · 2023年5月7日

On the Usage of Continual Learning for Out-of-Distribution Generalization in Pre-trained Language Models of Code

Arxiv

0+阅读 · 2023年5月6日

A Survey of Quantization Methods for Efficient Neural Network Inference

Arxiv

22+阅读 · 2021年6月21日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

相关基金

利用贝叶斯方法估计LAMOST恒星参数

国家自然科学基金

2+阅读 · 2015年12月31日

特殊环境下大型构件全位置焊接移动机器人自主定位方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

机器翻译中大规模异类特征的迁移学习

国家自然科学基金

2+阅读 · 2013年12月31日

大变形结构无网格拓扑优化方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

microRNA-194通过组蛋白修饰对角膜内皮早衰的调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

高速高精度电主轴温升预测及其主动控制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于超高分辨率视频的HEVC低复杂度模型和方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

大型特殊矩阵的降阶和特征值分布

国家自然科学基金

0+阅读 · 2009年12月31日

具有靶向性的温度和pH双敏感共聚物-紫杉醇键合药的合成

国家自然科学基金

0+阅读 · 2009年12月31日

基于分布式水文模型的流域尺度土壤湿度遥感数据同化研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员