松散的MOEs 满足高效的集合 (Sparse MoEs meet Efficient Ensembles) - 专知论文

会员服务 ·

0

稀疏 · MoDELS · Extensibility · 集成 · 混合专家模型 ·

2021 年 10 月 7 日

Sparse MoEs meet Efficient Ensembles

翻译：松散的MOEs 满足高效的集合

James Urquhart Allingham,Florian Wenzel,Zelda E Mariet,Basil Mustafa,Joan Puigcerver,Neil Houlsby,Ghassen Jerfel,Vincent Fortuin,Balaji Lakshminarayanan,Jasper Snoek,Dustin Tran,Carlos Riquelme Ruiz,Rodolphe Jenatton

from arxiv, 44 pages, 19 figures, 24 tables

Machine learning models based on the aggregated outputs of submodels, either at the activation or prediction levels, lead to strong performance. We study the interplay of two popular classes of such models: ensembles of neural networks and sparse mixture of experts (sparse MoEs). First, we show that these two approaches have complementary features whose combination is beneficial. Then, we present partitioned batch ensembles, an efficient ensemble of sparse MoEs that takes the best of both classes of models. Extensive experiments on fine-tuned vision transformers demonstrate the accuracy, log-likelihood, few-shot learning, robustness, and uncertainty calibration improvements of our approach over several challenging baselines. Partitioned batch ensembles not only scale to models with up to 2.7B parameters, but also provide larger performance gains for larger models.

翻译：基于次级模型总产出的机械学习模型,无论是在激活或预测水平上,都会导致强劲的绩效。我们研究了这类模型中两个受欢迎的类别之间的相互作用:神经网络的集合和专家的稀疏混合。首先,我们表明这两种方法具有互补的特征,其组合是有益的。然后,我们展示了分批组合,一个高效的分散的部系组合,它取材于两种模型的最佳类别。关于精细调准的视觉变压器的广泛实验显示了我们在几个具有挑战性的基线上的方法的准确性、日志相似性、少见的学习、稳健性和不确定性校准改进。分批组合不仅向符合2.7B参数的模型扩展,而且还为较大的模型提供了更大的性能收益。

0

相关内容

【ICML2021】SparseBERT: 自注意力机制的重要性分析再思考

专知会员服务

37+阅读 · 2021年5月15日

最新《深度学习理论》笔记，68页pdf

最新《深度学习理论》笔记，68页pdf

专知会员服务

50+阅读 · 2021年2月14日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【图神经网络遇上符号计算】Graph Neural Networks Meet Neural-Symbolic Computing: A Survey and Perspective

【图神经网络遇上符号计算】Graph Neural Networks Meet Neural-Symbolic Computing: A Survey and Perspective

专知会员服务

44+阅读 · 2020年3月3日

【斯坦福大学】TASO:基于深度学习优化的自动生成图变换（TASO: Optimizing Deep Learning with Automatic Generation of Graph Substitutions），35页ppt

【斯坦福大学】TASO:基于深度学习优化的自动生成图变换（TASO: Optimizing Deep Learning with Automatic Generation of Graph Substitutions），35页ppt

专知会员服务

10+阅读 · 2019年12月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【基础】集成学习（Ensemble Learning）

【基础】集成学习（Ensemble Learning）

深度学习自然语言处理

4+阅读 · 2020年2月7日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

On the Effectiveness of Neural Ensembles for Image Classification with Small Datasets

Arxiv

0+阅读 · 2021年11月29日

Cost-sensitive Selection of Variables by Ensemble of Model Sequences

Arxiv

0+阅读 · 2021年11月28日

Factor-augmented tree ensembles

Arxiv

0+阅读 · 2021年11月27日

Semi-Supervised Music Tagging Transformer

Arxiv

0+阅读 · 2021年11月26日

Efficient Self-Ensemble Framework for Semantic Segmentation

Arxiv

0+阅读 · 2021年11月26日

Hyperparameter Ensembles for Robustness and Uncertainty Quantification

Arxiv

12+阅读 · 2020年6月24日

Object-Contextual Representations for Semantic Segmentation

Object-Contextual Representations for Semantic Segmentation

Arxiv

7+阅读 · 2019年11月19日

Sparse Sequence-to-Sequence Models

Sparse Sequence-to-Sequence Models

Arxiv

5+阅读 · 2019年5月14日

A New Ensemble Learning Framework for 3D Biomedical Image Segmentation

A New Ensemble Learning Framework for 3D Biomedical Image Segmentation

Arxiv

5+阅读 · 2018年12月10日

Deep Randomized Ensembles for Metric Learning

Deep Randomized Ensembles for Metric Learning

Arxiv

5+阅读 · 2018年9月4日

VIP会员

文章信息

相关主题

混合专家模型

相关VIP内容

【ICML2021】SparseBERT: 自注意力机制的重要性分析再思考

专知会员服务

37+阅读 · 2021年5月15日

最新《深度学习理论》笔记，68页pdf

最新《深度学习理论》笔记，68页pdf

专知会员服务

50+阅读 · 2021年2月14日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【图神经网络遇上符号计算】Graph Neural Networks Meet Neural-Symbolic Computing: A Survey and Perspective

【图神经网络遇上符号计算】Graph Neural Networks Meet Neural-Symbolic Computing: A Survey and Perspective

专知会员服务

44+阅读 · 2020年3月3日

【斯坦福大学】TASO:基于深度学习优化的自动生成图变换（TASO: Optimizing Deep Learning with Automatic Generation of Graph Substitutions），35页ppt

【斯坦福大学】TASO:基于深度学习优化的自动生成图变换（TASO: Optimizing Deep Learning with Automatic Generation of Graph Substitutions），35页ppt

专知会员服务

10+阅读 · 2019年12月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《算法战争研究计划全景评估》35页

《分层多智能体系统分类：设计范式、协调机制与工业应用》最新28页

智能体战争：自主人工智能军备竞赛全景透视

《太空对抗中未知追踪者目标下的规避策略研究》122页

相关资讯

【基础】集成学习（Ensemble Learning）

【基础】集成学习（Ensemble Learning）

深度学习自然语言处理

4+阅读 · 2020年2月7日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

On the Effectiveness of Neural Ensembles for Image Classification with Small Datasets

Arxiv

0+阅读 · 2021年11月29日

Cost-sensitive Selection of Variables by Ensemble of Model Sequences

Arxiv

0+阅读 · 2021年11月28日

Factor-augmented tree ensembles

Arxiv

0+阅读 · 2021年11月27日

Semi-Supervised Music Tagging Transformer

Arxiv

0+阅读 · 2021年11月26日

Efficient Self-Ensemble Framework for Semantic Segmentation

Arxiv

0+阅读 · 2021年11月26日

Hyperparameter Ensembles for Robustness and Uncertainty Quantification

Arxiv

12+阅读 · 2020年6月24日

Object-Contextual Representations for Semantic Segmentation

Object-Contextual Representations for Semantic Segmentation

Arxiv

7+阅读 · 2019年11月19日

Sparse Sequence-to-Sequence Models

Sparse Sequence-to-Sequence Models

Arxiv

5+阅读 · 2019年5月14日

A New Ensemble Learning Framework for 3D Biomedical Image Segmentation

A New Ensemble Learning Framework for 3D Biomedical Image Segmentation

Arxiv

5+阅读 · 2018年12月10日

Deep Randomized Ensembles for Metric Learning

Deep Randomized Ensembles for Metric Learning

Arxiv

5+阅读 · 2018年9月4日

微信扫码咨询专知VIP会员