共享微指数，微调深度学习 (With Shared Microexponents, A Little Shifting Goes a Long Way) - 专知论文

会员服务 ·

0

缩放 · 精度 · 微调 · 超细 · 深度学习 ·

2023 年 4 月 13 日

With Shared Microexponents, A Little Shifting Goes a Long Way

翻译：共享微指数，微调深度学习

Bita Rouhani,Ritchie Zhao,Venmugil Elango,Rasoul Shafipour,Mathew Hall,Maral Mesmakhosroshahi,Ankit More,Levi Melnick,Maximilian Golub,Girish Varatkar,Lei Shao,Gaurav Kolhe,Dimitry Melts,Jasmine Klar,Renee L'Heureux,Matt Perry,Doug Burger,Eric Chung,Zhaoxia Deng,Sam Naghshineh,Jongsoo Park,Maxim Naumov

This paper introduces Block Data Representations (BDR), a framework for exploring and evaluating a wide spectrum of narrow-precision formats for deep learning. It enables comparison of popular quantization standards, and through BDR, new formats based on shared microexponents (MX) are identified, which outperform other state-of-the-art quantization approaches, including narrow-precision floating-point and block floating-point. MX utilizes multiple levels of quantization scaling with ultra-fine scaling factors based on shared microexponents in the hardware. The effectiveness of MX is demonstrated on real-world models including large-scale generative pretraining and inferencing, and production-scale recommendation systems.

翻译：本文介绍了块数据表示（BDR），这是一种用于探索和评估各种窄精度格式的框架，以用于深度学习。它可以比较流行的量化标准，并通过BDR，确认了基于共享微指数（MX）的新格式，这些格式优于其他最先进的量化方法，包括窄精度浮点和块浮点数。MX在硬件上利用多个量化级别的量化缩放，其中超细缩放因子基于共享微指数。在实际模型中，包括大规模生成预训练和推理，以及生产规模的推荐系统，MX的有效性得到了证明。

0

相关内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

【斯坦福&Facebook】生成式对抗变换器，Generative Adversarial Transformers

专知会员服务

21+阅读 · 2021年4月21日

【MIT】反偏差对比学习，Debiased Contrastive Learning

【MIT】反偏差对比学习，Debiased Contrastive Learning

专知会员服务

91+阅读 · 2020年7月4日

【斯坦福】探究预训练语言模型中的可迁移性，Investigating Transferability in PLM

【斯坦福】探究预训练语言模型中的可迁移性，Investigating Transferability in PLM

专知会员服务

20+阅读 · 2020年5月3日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【斯坦福大学】深度学习技巧速查清单《CS 230 - Deep Learning Tips and Tricks Cheatsheet》

【斯坦福大学】深度学习技巧速查清单《CS 230 - Deep Learning Tips and Tricks Cheatsheet》

专知会员服务

29+阅读 · 2019年12月19日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

ICLR2023推荐系统投稿论文集锦

ICLR2023推荐系统投稿论文集锦

图与推荐

0+阅读 · 2022年11月15日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

机器学习医学影像方向，伦敦帝国理工秦宸博士组招收博士生（含奖学金）

机器学习医学影像方向，伦敦帝国理工秦宸博士组招收博士生（含奖学金）

机器之心

0+阅读 · 2022年9月17日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

已删除

将门创投

18+阅读 · 2019年2月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

自然语言处理 (NLP)资源大全

自然语言处理 (NLP)资源大全

机械鸡

35+阅读 · 2017年9月17日

回声干扰抑制中的自适应信号处理算法研究

国家自然科学基金

1+阅读 · 2015年12月31日

相场方程的弱超内罚间断Galerkin方法及其自适应算法

国家自然科学基金

1+阅读 · 2015年12月31日

首都新机场建设对周边区域城市化进程的影响预测

国家自然科学基金

0+阅读 · 2015年12月31日

下丘脑Kiss1基因的表观遗传修饰与绵羊季节性繁殖相关性研究

国家自然科学基金

0+阅读 · 2013年12月31日

(Cu,Ag)2Se材料的电子-声子输运特性与热电性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

面板数据模型的最优设计

国家自然科学基金

0+阅读 · 2013年12月31日

Runge-Kutta间断Galerkin方法的各向异性自适应方法及其应用

国家自然科学基金

0+阅读 · 2012年12月31日

随机延时神经网络的吸引子和分岔

国家自然科学基金

1+阅读 · 2012年12月31日

基于移动网格的局部间断Galerkin有限元方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

量子散射中的异常现象、Levinson 定理及其它

国家自然科学基金

0+阅读 · 2011年12月31日

Consistency Models

Arxiv

0+阅读 · 2023年5月31日

Large Language Models as Tool Makers

Large Language Models as Tool Makers

Arxiv

1+阅读 · 2023年5月26日

Random-Access Neural Compression of Material Textures

Arxiv

0+阅读 · 2023年5月26日

Adversarial Attacks on Online Learning to Rank with Click Feedback

Arxiv

0+阅读 · 2023年5月26日

Differentiable Random Partition Models

Arxiv

0+阅读 · 2023年5月26日

KeyPosS: Plug-and-Play Facial Landmark Detection through GPS-Inspired True-Range Multilateration

Arxiv

0+阅读 · 2023年5月25日

Are We There Yet? Product Quantization and its Hardware Acceleration

Arxiv

0+阅读 · 2023年5月25日

Learning with Differentiable Algorithms

Arxiv

11+阅读 · 2022年9月1日

Deformable Style Transfer

Deformable Style Transfer

Arxiv

14+阅读 · 2020年3月24日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

VIP会员

文章信息

相关主题

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

【斯坦福&Facebook】生成式对抗变换器，Generative Adversarial Transformers

专知会员服务

21+阅读 · 2021年4月21日

【MIT】反偏差对比学习，Debiased Contrastive Learning

【MIT】反偏差对比学习，Debiased Contrastive Learning

专知会员服务

91+阅读 · 2020年7月4日

【斯坦福】探究预训练语言模型中的可迁移性，Investigating Transferability in PLM

【斯坦福】探究预训练语言模型中的可迁移性，Investigating Transferability in PLM

专知会员服务

20+阅读 · 2020年5月3日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【斯坦福大学】深度学习技巧速查清单《CS 230 - Deep Learning Tips and Tricks Cheatsheet》

【斯坦福大学】深度学习技巧速查清单《CS 230 - Deep Learning Tips and Tricks Cheatsheet》

专知会员服务

29+阅读 · 2019年12月19日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】面向企业的图学习扩展：生产级图学习与推理，485页pdf

AI智能体编程：技术、挑战与机遇综述

【国家标准】数据安全技术数据安全风险评估方法

【CMU博士论文】交互式学习的进展：替代性反馈机制与自适应因果推理

相关资讯

ICLR2023推荐系统投稿论文集锦

ICLR2023推荐系统投稿论文集锦

图与推荐

0+阅读 · 2022年11月15日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

机器学习医学影像方向，伦敦帝国理工秦宸博士组招收博士生（含奖学金）

机器学习医学影像方向，伦敦帝国理工秦宸博士组招收博士生（含奖学金）

机器之心

0+阅读 · 2022年9月17日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

已删除

将门创投

18+阅读 · 2019年2月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

自然语言处理 (NLP)资源大全

自然语言处理 (NLP)资源大全

机械鸡

35+阅读 · 2017年9月17日

相关论文

Consistency Models

Arxiv

0+阅读 · 2023年5月31日

Large Language Models as Tool Makers

Large Language Models as Tool Makers

Arxiv

1+阅读 · 2023年5月26日

Random-Access Neural Compression of Material Textures

Arxiv

0+阅读 · 2023年5月26日

Adversarial Attacks on Online Learning to Rank with Click Feedback

Arxiv

0+阅读 · 2023年5月26日

Differentiable Random Partition Models

Arxiv

0+阅读 · 2023年5月26日

KeyPosS: Plug-and-Play Facial Landmark Detection through GPS-Inspired True-Range Multilateration

Arxiv

0+阅读 · 2023年5月25日

Are We There Yet? Product Quantization and its Hardware Acceleration

Arxiv

0+阅读 · 2023年5月25日

Learning with Differentiable Algorithms

Arxiv

11+阅读 · 2022年9月1日

Deformable Style Transfer

Deformable Style Transfer

Arxiv

14+阅读 · 2020年3月24日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

相关基金

回声干扰抑制中的自适应信号处理算法研究

国家自然科学基金

1+阅读 · 2015年12月31日

相场方程的弱超内罚间断Galerkin方法及其自适应算法

国家自然科学基金

1+阅读 · 2015年12月31日

首都新机场建设对周边区域城市化进程的影响预测

国家自然科学基金

0+阅读 · 2015年12月31日

下丘脑Kiss1基因的表观遗传修饰与绵羊季节性繁殖相关性研究

国家自然科学基金

0+阅读 · 2013年12月31日

(Cu,Ag)2Se材料的电子-声子输运特性与热电性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

面板数据模型的最优设计

国家自然科学基金

0+阅读 · 2013年12月31日

Runge-Kutta间断Galerkin方法的各向异性自适应方法及其应用

国家自然科学基金

0+阅读 · 2012年12月31日

随机延时神经网络的吸引子和分岔

国家自然科学基金

1+阅读 · 2012年12月31日

基于移动网格的局部间断Galerkin有限元方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

量子散射中的异常现象、Levinson 定理及其它

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员