松散- MLP:带有条件计算条件的完整MLP结构 (Sparse-MLP: A Fully-MLP Architecture with Conditional Computation) - 专知论文

会员服务 ·

0

条件计算 · 稀疏 · 计算成本 · INFORMS · MoDELS ·

2021 年 9 月 8 日

Sparse-MLP: A Fully-MLP Architecture with Conditional Computation

翻译：松散- MLP:带有条件计算条件的完整MLP结构

Yuxuan Lou,Fuzhao Xue,Zangwei Zheng,Yang You

Mixture-of-Experts (MoE) with sparse conditional computation has been proved an effective architecture for scaling attention-based models to more parameters with comparable computation cost. In this paper, we propose Sparse-MLP, scaling the recent MLP-Mixer model with sparse MoE layers, to achieve a more computation-efficient architecture. We replace a subset of dense MLP blocks in the MLP-Mixer model with Sparse blocks. In each Sparse block, we apply two stages of MoE layers: one with MLP experts mixing information within channels along image patch dimension, one with MLP experts mixing information within patches along the channel dimension. Besides, to reduce computational cost in routing and improve expert capacity, we design Re-represent layers in each Sparse block. These layers are to re-scale image representations by two simple but effective linear transformations. When pre-training on ImageNet-1k with MoCo v3 algorithm, our models can outperform dense MLP models by 2.5\% on ImageNet Top-1 accuracy with fewer parameters and computational cost. On small-scale downstream image classification tasks, i.e. Cifar10 and Cifar100, our Sparse-MLP can still achieve better performance than baselines.

翻译：在本文中,我们提议Sprassy-MLP, 将最近的MLP-Mixer模型与稀有的MOE层相匹配,以实现一个更高效的计算结构。我们用Sparse区块取代MLP-Mixer模型中密集的MLP区块。在Sparse区块中,我们应用了MOE层的两个阶段:一个是MLP专家,将基于关注的模型与图像补丁维度的频道内的信息混杂在一起,一个是MLP专家,在频道维度的补丁中将信息混杂在一起。此外,为了降低路由和增强专家能力方面的计算成本,我们设计了最新的MLP-Mixer模型,在每一个微小区块中,我们设计了重新展示层层。我们用两个简单有效的线性变形模型进行图像Net-1k 和MoCo v3算法的预培训时,我们的模型可以比图像网顶部1级和计算成本低的2.5°的MLP模型优于MU10级的图像网络下层图像和CSMA10级的基线任务,在小型、CMLMLS-S-S-S-C-C-C-C-C-C-C-C-CS-C-C-C-C-C-C-C-C-S-C-C-C-S-S-S-S-S-S-S-S-S-S-S-S-S-B-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-

0

相关内容

条件计算

【干货书】数据科学统计推断，124页pdf

专知会员服务

79+阅读 · 2021年10月12日

【经典书】在线学习与在线凸优化，90页pdf

【经典书】在线学习与在线凸优化，90页pdf

专知会员服务

59+阅读 · 2021年10月10日

2021机器学习研究风向是啥？MLP→CNN→Transformer→MLP！

2021机器学习研究风向是啥？MLP→CNN→Transformer→MLP！

专知会员服务

67+阅读 · 2021年5月23日

必须收藏！MIT-Gilbert老爷子《矩阵图解》，一张图看透矩阵

必须收藏！MIT-Gilbert老爷子《矩阵图解》，一张图看透矩阵

专知会员服务

111+阅读 · 2020年11月17日

【NeurIPS2019】高性能浅层RNN的类脑目标识别（Brain-Like Object Recognition with High-Performing Shallow Recurrent ANNs）

【NeurIPS2019】高性能浅层RNN的类脑目标识别（Brain-Like Object Recognition with High-Performing Shallow Recurrent ANNs）

专知会员服务

13+阅读 · 2019年12月13日

【论文|Google】基于元学习的排序架构，Ranking architectures using meta-learning

【论文|Google】基于元学习的排序架构，Ranking architectures using meta-learning

专知会员服务

18+阅读 · 2019年11月30日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

【AAAI 2019 Tutorial】对抗机器学习（Adversarial Machine Learning），Bo Li，Dawn Song，Yevgeniy Vorobeychik

【AAAI 2019 Tutorial】对抗机器学习（Adversarial Machine Learning），Bo Li，Dawn Song，Yevgeniy Vorobeychik

专知会员服务

29+阅读 · 2019年11月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

LibRec 精选：基于参数共享的CNN-RNN混合模型

LibRec 精选：基于参数共享的CNN-RNN混合模型

LibRec智能推荐

6+阅读 · 2019年3月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Neighborhood-Aware Neural Architecture Search

Arxiv

0+阅读 · 2021年10月29日

Griffin: Rethinking Sparse Optimization for Deep Learning Architectures

Arxiv

0+阅读 · 2021年10月29日

MLP-Mixer: An all-MLP Architecture for Vision

Arxiv

9+阅读 · 2021年5月17日

RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition

Arxiv

8+阅读 · 2021年5月5日

Few-shot Neural Architecture Search

Arxiv

8+阅读 · 2020年6月15日

On Layer Normalization in the Transformer Architecture

Arxiv

4+阅读 · 2020年2月12日

Insertion-based Decoding with automatically Inferred Generation Order

Arxiv

5+阅读 · 2019年2月28日

Neural Architecture Optimization

Neural Architecture Optimization

Arxiv

8+阅读 · 2018年9月5日

MnasNet: Platform-Aware Neural Architecture Search for Mobile

Arxiv

4+阅读 · 2018年7月31日

Temporal 3D ConvNets: New Architecture and Transfer Learning for Video Classification

Arxiv

8+阅读 · 2017年11月22日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】数据科学统计推断，124页pdf

专知会员服务

79+阅读 · 2021年10月12日

【经典书】在线学习与在线凸优化，90页pdf

【经典书】在线学习与在线凸优化，90页pdf

专知会员服务

59+阅读 · 2021年10月10日

2021机器学习研究风向是啥？MLP→CNN→Transformer→MLP！

2021机器学习研究风向是啥？MLP→CNN→Transformer→MLP！

专知会员服务

67+阅读 · 2021年5月23日

必须收藏！MIT-Gilbert老爷子《矩阵图解》，一张图看透矩阵

必须收藏！MIT-Gilbert老爷子《矩阵图解》，一张图看透矩阵

专知会员服务

111+阅读 · 2020年11月17日

【NeurIPS2019】高性能浅层RNN的类脑目标识别（Brain-Like Object Recognition with High-Performing Shallow Recurrent ANNs）

【NeurIPS2019】高性能浅层RNN的类脑目标识别（Brain-Like Object Recognition with High-Performing Shallow Recurrent ANNs）

专知会员服务

13+阅读 · 2019年12月13日

【论文|Google】基于元学习的排序架构，Ranking architectures using meta-learning

【论文|Google】基于元学习的排序架构，Ranking architectures using meta-learning

专知会员服务

18+阅读 · 2019年11月30日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

【AAAI 2019 Tutorial】对抗机器学习（Adversarial Machine Learning），Bo Li，Dawn Song，Yevgeniy Vorobeychik

【AAAI 2019 Tutorial】对抗机器学习（Adversarial Machine Learning），Bo Li，Dawn Song，Yevgeniy Vorobeychik

专知会员服务

29+阅读 · 2019年11月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

LibRec 精选：基于参数共享的CNN-RNN混合模型

LibRec 精选：基于参数共享的CNN-RNN混合模型

LibRec智能推荐

6+阅读 · 2019年3月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Neighborhood-Aware Neural Architecture Search

Arxiv

0+阅读 · 2021年10月29日

Griffin: Rethinking Sparse Optimization for Deep Learning Architectures

Arxiv

0+阅读 · 2021年10月29日

MLP-Mixer: An all-MLP Architecture for Vision

Arxiv

9+阅读 · 2021年5月17日

RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition

Arxiv

8+阅读 · 2021年5月5日

Few-shot Neural Architecture Search

Arxiv

8+阅读 · 2020年6月15日

On Layer Normalization in the Transformer Architecture

Arxiv

4+阅读 · 2020年2月12日

Insertion-based Decoding with automatically Inferred Generation Order

Arxiv

5+阅读 · 2019年2月28日

Neural Architecture Optimization

Neural Architecture Optimization

Arxiv

8+阅读 · 2018年9月5日

MnasNet: Platform-Aware Neural Architecture Search for Mobile

Arxiv

4+阅读 · 2018年7月31日

Temporal 3D ConvNets: New Architecture and Transfer Learning for Video Classification

Arxiv

8+阅读 · 2017年11月22日

微信扫码咨询专知VIP会员