SIFT: 为最大化训练效率而设计的稀疏等FLOP变换 (SIFT: Sparse Iso-FLOP Transformations for Maximizing Training Efficiency) - 专知论文

会员服务 ·

0

SIFT · 模型评估 · 特化 · 稀疏 · 变换 ·

2023 年 3 月 21 日

SIFT: Sparse Iso-FLOP Transformations for Maximizing Training Efficiency

翻译：SIFT: 为最大化训练效率而设计的稀疏等FLOP变换

Shreyas Saxena,Vithursan Thangarasa,Abhay Gupta,Sean Lie

Recent works have explored the use of weight sparsity to improve the training efficiency (test accuracy w.r.t training FLOPs) of deep neural networks (DNNs). These works aim to reduce training FLOPs but training with sparse weights often leads to accuracy loss or requires longer train schedules, making the resulting training efficiency less clear. In contrast, we focus on using sparsity to increase accuracy while using the same FLOPS as the dense model and show training efficiency gains through higher accuracy. In this work, we introduce SIFT, a family of Sparse Iso-FLOP Transformations which are used as drop-in replacements for dense layers to improve their representational capacity and FLOP efficiency. Each transformation is parameterized by a single parameter (sparsity level) and provides a larger search space to find optimal sparse masks. Without changing any training hyperparameters, replacing dense layers with SIFT leads to significant improvements across computer vision (CV) and natural language processing (NLP) tasks, including ResNet-18 on ImageNet (+3.5%) and GPT-3 Small on WikiText-103 (-0.4 PPL), both matching larger dense model variants with 2x or more FLOPs. To the best of our knowledge, this is the first work to demonstrate the use of sparsity for improving accuracy of dense models via a simple-to-use set of sparse transformations. Code is available at: https://github.com/CerebrasResearch/SIFT.

翻译：最近的工作探索了利用权重稀疏性来改善深度神经网络（DNN）的训练效率（测试准确性对训练FLOPs的影响）。这些工作旨在减少训练FLOPs，但是使用稀疏权重进行训练通常会导致准确性降低或需要更长的训练计划，从而使得结果的训练效率不太明确。相比之下，我们侧重于使用稀疏性来提高准确性，同时使用相同的FLOPs作为稠密模型，并通过更高的准确性展示了训练效率的提高。在这项工作中，我们引入了SIFT，一系列稀疏等FLOP转换，用作稠密层的drop-in替代品，以提高其表征能力和FLOP效率。每个变换都由单个参数（稀疏级别）进行参数化，并提供更大的搜索空间以找到最佳的稀疏掩模。在不改变任何培训超参数的情况下，将稠密层替换为SIFT会在计算机视觉（CV）和自然语言处理（NLP）任务中实现显著的改进，包括在ImageNet上的ResNet-18（+3.5％）和在WikiText-103上的GPT-3 Small（-0.4 PPL），两者都与具有2倍或更多FLOP的更大稠密模型变体匹配。据我们所知，这是首次通过一组易于使用的稀疏变换来演示利用稀疏性提高密集模型的准确性。代码可在以下网址获得：https：//github.com/CerebrasResearch/SIFT。

0

相关内容

SIFT

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

【CVPR 2022】NUS&字节跳动提出Shunted Transformer：多尺度Token叠加

【CVPR 2022】NUS&字节跳动提出Shunted Transformer：多尺度Token叠加

专知会员服务

16+阅读 · 2022年4月8日

NeurIPS 2021 | 寻MixTraining: 一种全新的物体检测训练范式

NeurIPS 2021 | 寻MixTraining: 一种全新的物体检测训练范式

专知会员服务

12+阅读 · 2021年12月9日

基于粗粒度数据流架构的稀疏卷积神经网络加速

专知会员服务

23+阅读 · 2021年7月15日

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

专知会员服务

23+阅读 · 2021年6月3日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【ICML2020-伯克利】反直觉！大模型重压缩提升Transformer的训练和推理效率，47页ppt

【ICML2020-伯克利】反直觉！大模型重压缩提升Transformer的训练和推理效率，47页ppt

专知会员服务

70+阅读 · 2020年7月1日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【CVPR2020】强化特征点，Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task

【CVPR2020】强化特征点，Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task

专知会员服务

49+阅读 · 2020年2月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

又快又强的轻量化主干来了！EfficientFormer：在iPhone上能实时推理的ViT模型

又快又强的轻量化主干来了！EfficientFormer：在iPhone上能实时推理的ViT模型

CVer

1+阅读 · 2022年6月5日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

方差正则化的分类模型选择方法研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于自适应稀疏算子的图像乘性噪声移除方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

复杂场景中高维曲线的Hough变换检测方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

面向数据表示的深度稀疏保持学习

国家自然科学基金

7+阅读 · 2013年12月31日

基于深度学习的隐写分析新方法研究

国家自然科学基金

4+阅读 · 2013年12月31日

稀疏约束方法在探地雷达反演与成像中的应用

国家自然科学基金

1+阅读 · 2013年12月31日

基于CRT的低复杂度LDPC不规则码构造算法及理论研究

国家自然科学基金

0+阅读 · 2012年12月31日

低秩矩阵复原的Schatten-q(0<q<1)正则化理论与算法研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于Grouplet变换的SAR图像压缩感知编码

国家自然科学基金

0+阅读 · 2009年12月31日

音频信号处理中基于模型的语音与音乐信号分离算法

国家自然科学基金

1+阅读 · 2009年12月31日

Maximizing Influence with Graph Neural Networks

Arxiv

0+阅读 · 2023年5月11日

Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition

Arxiv

0+阅读 · 2023年5月11日

Optimizing Privacy, Utility and Efficiency in Constrained Multi-Objective Federated Learning

Arxiv

0+阅读 · 2023年5月9日

Robust Implicit Regularization via Weight Normalization

Arxiv

0+阅读 · 2023年5月9日

Both Efficiency and Effectiveness! A Large Scale Pre-ranking Framework in Search System

Arxiv

0+阅读 · 2023年5月9日

Self-Supervised Learning via Maximum Entropy Coding

Arxiv

13+阅读 · 2022年10月20日

Faster Meta Update Strategy for Noise-Robust Deep Learning

Arxiv

11+阅读 · 2021年4月30日

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Arxiv

14+阅读 · 2021年1月31日

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

Arxiv

16+阅读 · 2020年3月30日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

14+阅读 · 2019年8月8日

VIP会员

文章信息

相关主题

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

【CVPR 2022】NUS&字节跳动提出Shunted Transformer：多尺度Token叠加

【CVPR 2022】NUS&字节跳动提出Shunted Transformer：多尺度Token叠加

专知会员服务

16+阅读 · 2022年4月8日

NeurIPS 2021 | 寻MixTraining: 一种全新的物体检测训练范式

NeurIPS 2021 | 寻MixTraining: 一种全新的物体检测训练范式

专知会员服务

12+阅读 · 2021年12月9日

基于粗粒度数据流架构的稀疏卷积神经网络加速

专知会员服务

23+阅读 · 2021年7月15日

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

专知会员服务

23+阅读 · 2021年6月3日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【ICML2020-伯克利】反直觉！大模型重压缩提升Transformer的训练和推理效率，47页ppt

【ICML2020-伯克利】反直觉！大模型重压缩提升Transformer的训练和推理效率，47页ppt

专知会员服务

70+阅读 · 2020年7月1日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【CVPR2020】强化特征点，Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task

【CVPR2020】强化特征点，Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task

专知会员服务

49+阅读 · 2020年2月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

又快又强的轻量化主干来了！EfficientFormer：在iPhone上能实时推理的ViT模型

又快又强的轻量化主干来了！EfficientFormer：在iPhone上能实时推理的ViT模型

CVer

1+阅读 · 2022年6月5日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

相关论文

Maximizing Influence with Graph Neural Networks

Arxiv

0+阅读 · 2023年5月11日

Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition

Arxiv

0+阅读 · 2023年5月11日

Optimizing Privacy, Utility and Efficiency in Constrained Multi-Objective Federated Learning

Arxiv

0+阅读 · 2023年5月9日

Robust Implicit Regularization via Weight Normalization

Arxiv

0+阅读 · 2023年5月9日

Both Efficiency and Effectiveness! A Large Scale Pre-ranking Framework in Search System

Arxiv

0+阅读 · 2023年5月9日

Self-Supervised Learning via Maximum Entropy Coding

Arxiv

13+阅读 · 2022年10月20日

Faster Meta Update Strategy for Noise-Robust Deep Learning

Arxiv

11+阅读 · 2021年4月30日

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Arxiv

14+阅读 · 2021年1月31日

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

Arxiv

16+阅读 · 2020年3月30日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

14+阅读 · 2019年8月8日

相关基金

方差正则化的分类模型选择方法研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于自适应稀疏算子的图像乘性噪声移除方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

复杂场景中高维曲线的Hough变换检测方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

面向数据表示的深度稀疏保持学习

国家自然科学基金

7+阅读 · 2013年12月31日

基于深度学习的隐写分析新方法研究

国家自然科学基金

4+阅读 · 2013年12月31日

稀疏约束方法在探地雷达反演与成像中的应用

国家自然科学基金

1+阅读 · 2013年12月31日

基于CRT的低复杂度LDPC不规则码构造算法及理论研究

国家自然科学基金

0+阅读 · 2012年12月31日

低秩矩阵复原的Schatten-q(0<q<1)正则化理论与算法研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于Grouplet变换的SAR图像压缩感知编码

国家自然科学基金

0+阅读 · 2009年12月31日

音频信号处理中基于模型的语音与音乐信号分离算法

国家自然科学基金

1+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员