Sparsely-gated Mixture-of-Expert Layers for CNN Interpretability - 专知论文

会员服务 ·

0

层 · SOFT · MoDELS · 约束 · CNN ·

2023 年 4 月 27 日

Sparsely-gated Mixture-of-Expert Layers for CNN Interpretability

翻译：暂无翻译

Svetlana Pavlitska,Christian Hubschneider,Lukas Struppek,J. Marius Zöllner

from arxiv, Accepted for publication at IJCNN 2023

Sparsely-gated Mixture of Expert (MoE) layers have been recently successfully applied for scaling large transformers, especially for language modeling tasks. An intriguing side effect of sparse MoE layers is that they convey inherent interpretability to a model via natural expert specialization. In this work, we apply sparse MoE layers to CNNs for computer vision tasks and analyze the resulting effect on model interpretability. To stabilize MoE training, we present both soft and hard constraint-based approaches. With hard constraints, the weights of certain experts are allowed to become zero, while soft constraints balance the contribution of experts with an additional auxiliary loss. As a result, soft constraints handle expert utilization better and support the expert specialization process, while hard constraints maintain more generalized experts and increase overall model performance. Our findings demonstrate that experts can implicitly focus on individual sub-domains of the input space. For example, experts trained for CIFAR-100 image classification specialize in recognizing different domains such as flowers or animals without previous data clustering. Experiments with RetinaNet and the COCO dataset further indicate that object detection experts can also specialize in detecting objects of distinct sizes.

翻译：暂无翻译

0

相关内容

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

专知会员服务

52+阅读 · 2022年10月22日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

认知车联网双向通信中继策略及资源优化

国家自然科学基金

0+阅读 · 2013年12月31日

PPIB蛋白影响结直肠癌细胞凋亡的作用及其分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

腺苷受体A2b对少突胶质前体细胞发育和存活的调控及机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于移动网格的局部间断Galerkin有限元方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

艾滋病TH17/Treg失衡与STAT/SOCS调控及补肾解毒法的干预作用

国家自然科学基金

0+阅读 · 2011年12月31日

Adiponectin在肝脏缺血再灌注损伤中的抗肝细胞凋亡机制

国家自然科学基金

0+阅读 · 2009年12月31日

磁性纳米铁基复合氧化物催化臭氧化水中难降解有机物的研究

国家自然科学基金

0+阅读 · 2009年12月31日

多核异步数据触发微处理器设计关键技术研究

国家自然科学基金

0+阅读 · 2008年12月31日

等离子体改性活性炭纤维脱硫脱氮的研究

国家自然科学基金

0+阅读 · 2008年12月31日

SE-MoE: A Scalable and Efficient Mixture-of-Experts Distributed Training and Inference System

Arxiv

0+阅读 · 2023年6月12日

CI-GNN: A Granger Causality-Inspired Graph Neural Network for Interpretable Brain Network-Based Psychiatric Diagnosis

Arxiv

0+阅读 · 2023年6月9日

Distilling BlackBox to Interpretable models for Efficient Transfer Learning

Arxiv

0+阅读 · 2023年6月8日

On the Effectiveness of Out-of-Distribution Data in Self-Supervised Long-Tail Learning

Arxiv

0+阅读 · 2023年6月8日

Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts

Arxiv

0+阅读 · 2023年6月8日

Pre-Trained Models: Past, Present and Future

Arxiv

19+阅读 · 2021年6月15日

Memory-Gated Recurrent Networks

Memory-Gated Recurrent Networks

Arxiv

12+阅读 · 2020年12月24日

Interpretable CNNs for Object Classification

Interpretable CNNs for Object Classification

Arxiv

20+阅读 · 2020年3月12日

Learning with Interpretable Structure from RNN

Arxiv

19+阅读 · 2018年10月25日

Interpretable Convolutional Neural Networks

Arxiv

22+阅读 · 2018年2月14日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

专知会员服务

52+阅读 · 2022年10月22日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

机器人领域中最佳的三维场景表示是什么？——从几何表示到基础模型

《多域作战兵棋推演：运用形态学分析与人工智能加强国防人员训练》

【博士论文】快速高效的归一化流及其在图像生成模型中的应用

仿生机器人技术的军事应用

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

相关论文

SE-MoE: A Scalable and Efficient Mixture-of-Experts Distributed Training and Inference System

Arxiv

0+阅读 · 2023年6月12日

CI-GNN: A Granger Causality-Inspired Graph Neural Network for Interpretable Brain Network-Based Psychiatric Diagnosis

Arxiv

0+阅读 · 2023年6月9日

Distilling BlackBox to Interpretable models for Efficient Transfer Learning

Arxiv

0+阅读 · 2023年6月8日

On the Effectiveness of Out-of-Distribution Data in Self-Supervised Long-Tail Learning

Arxiv

0+阅读 · 2023年6月8日

Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts

Arxiv

0+阅读 · 2023年6月8日

Pre-Trained Models: Past, Present and Future

Arxiv

19+阅读 · 2021年6月15日

Memory-Gated Recurrent Networks

Memory-Gated Recurrent Networks

Arxiv

12+阅读 · 2020年12月24日

Interpretable CNNs for Object Classification

Interpretable CNNs for Object Classification

Arxiv

20+阅读 · 2020年3月12日

Learning with Interpretable Structure from RNN

Arxiv

19+阅读 · 2018年10月25日

Interpretable Convolutional Neural Networks

Arxiv

22+阅读 · 2018年2月14日

相关基金

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

认知车联网双向通信中继策略及资源优化

国家自然科学基金

0+阅读 · 2013年12月31日

PPIB蛋白影响结直肠癌细胞凋亡的作用及其分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

腺苷受体A2b对少突胶质前体细胞发育和存活的调控及机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于移动网格的局部间断Galerkin有限元方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

艾滋病TH17/Treg失衡与STAT/SOCS调控及补肾解毒法的干预作用

国家自然科学基金

0+阅读 · 2011年12月31日

Adiponectin在肝脏缺血再灌注损伤中的抗肝细胞凋亡机制

国家自然科学基金

0+阅读 · 2009年12月31日

磁性纳米铁基复合氧化物催化臭氧化水中难降解有机物的研究

国家自然科学基金

0+阅读 · 2009年12月31日

多核异步数据触发微处理器设计关键技术研究

国家自然科学基金

0+阅读 · 2008年12月31日

等离子体改性活性炭纤维脱硫脱氮的研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员