功能聚合聚合蒸馏 (Functional Ensemble Distillation) - 专知论文

会员服务 ·

0

蒸馏 · 估计/估计量 · 集成 · MoDELS · 泛函 ·

2022 年 6 月 5 日

Functional Ensemble Distillation

翻译：功能聚合聚合蒸馏

Coby Penso,Idan Achituve,Ethan Fetaya

Bayesian models have many desirable properties, most notable is their ability to generalize from limited data and to properly estimate the uncertainty in their predictions. However, these benefits come at a steep computational cost as Bayesian inference, in most cases, is computationally intractable. One popular approach to alleviate this problem is using a Monte-Carlo estimation with an ensemble of models sampled from the posterior. However, this approach still comes at a significant computational cost, as one needs to store and run multiple models at test time. In this work, we investigate how to best distill an ensemble's predictions using an efficient model. First, we argue that current approaches that simply return distribution over predictions cannot compute important properties, such as the covariance between predictions, which can be valuable for further processing. Second, in many limited data settings, all ensemble members achieve nearly zero training loss, namely, they produce near-identical predictions on the training set which results in sub-optimal distilled models. To address both problems, we propose a novel and general distillation approach, named Functional Ensemble Distillation (FED), and we investigate how to best distill an ensemble in this setting. We find that learning the distilled model via a simple augmentation scheme in the form of mixup augmentation significantly boosts the performance. We evaluated our method on several tasks and showed that it achieves superior results in both accuracy and uncertainty estimation compared to current approaches.

翻译：贝叶斯模型有许多可取的特性,最显著的是它们能够从有限的数据中概括归纳,并适当估计预测中的不确定性。然而,这些效益的计算成本非常高,因为贝叶斯的推论在多数情况下都是难以计算的。缓解这一问题的一种流行方法是使用蒙特-卡洛估算,从后方抽样的模型组合在一起进行。然而,这一方法仍然具有巨大的计算成本,因为人们需要在测试时储存和运行多个模型。在这项工作中,我们研究如何用一个高效模型来优化组合预测的预测。首先,我们认为,目前简单地对预测进行回流分布的方法无法计算重要的属性,例如预测之间的共变数,这对于进一步处理可能很有价值。第二,在许多有限的数据环境中,所有共同成员都达到近零培训损失,即它们产生几乎相同的模型预测,从而导致次优化的模拟模型。为了解决这两个问题,我们建议对当前精细的预测方法进行新颖和一般的蒸馏方法进行比较,我们称之为从功能增长中找出一个最佳的升级方法,我们从增长中找到一个最佳的方法。

0

相关内容

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

72+阅读 · 2022年7月11日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

快来报名啦 | 图灵奖得主—— Joseph Sifakis明日重磅开讲

快来报名啦 | 图灵奖得主—— Joseph Sifakis明日重磅开讲

学术头条

0+阅读 · 2022年6月16日

【泡泡汇总】CVPR2019 SLAM Paperlist

【泡泡汇总】CVPR2019 SLAM Paperlist

泡泡机器人SLAM

14+阅读 · 2019年6月12日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

专知

50+阅读 · 2018年4月25日

MicroRNA调控DNA甲基化对先天性心脏病的发生和干预的功能研究

国家自然科学基金

0+阅读 · 2014年12月31日

脂肪细胞因子家族基因多态性与动脉粥样硬化性脑梗死的相关性研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于非欧式空间距离度量的地理加权回归分析模型研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于OCT-RNFL的MCI-AD个体化筛查模型的建立和验证

国家自然科学基金

0+阅读 · 2012年12月31日

磷酸化对聚腺苷酸专一性核糖核酸酶结构和功能的调控

国家自然科学基金

0+阅读 · 2012年12月31日

放疗中段功能分子影像指导非小细胞肺癌后程自适应放疗

国家自然科学基金

0+阅读 · 2011年12月31日

microRNA参与调控新生儿支气管肺发育不良的分子机制研究

国家自然科学基金

0+阅读 · 2010年12月31日

基于Compressive sensing理论的单探测器太赫兹成像技术

国家自然科学基金

0+阅读 · 2009年12月31日

miR-21和miR-34a在3,6-二羟黄酮抗乳腺癌发生中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

组合导航系统中基于混沌、小波和神经网络的信息融合方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

TinyViT: Fast Pretraining Distillation for Small Vision Transformers

Arxiv

1+阅读 · 2022年7月21日

Sobolev Training for Implicit Neural Representations with Approximated Image Derivatives

Arxiv

0+阅读 · 2022年7月21日

Learning from Data with Noisy Labels Using Temporal Self-Ensemble

Arxiv

0+阅读 · 2022年7月21日

Pretraining a Neural Network before Knowing Its Architecture

Arxiv

0+阅读 · 2022年7月20日

Evaluating State of the Art, Forecasting Ensembles- and Meta-learning Strategies for Model Fusion

Arxiv

0+阅读 · 2022年7月19日

Adaptive Learning for the Resource-Constrained Classification Problem

Arxiv

0+阅读 · 2022年7月19日

SCARA: Scalable Graph Neural Networks with Feature-Oriented Optimization

Arxiv

0+阅读 · 2022年7月19日

Data-Free Knowledge Transfer: A Survey

Arxiv

21+阅读 · 2021年12月31日

Data-Free Knowledge Distillation for Heterogeneous Federated Learning

Arxiv

12+阅读 · 2021年6月9日

A Modern Introduction to Online Learning

A Modern Introduction to Online Learning

Arxiv

21+阅读 · 2019年12月31日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

72+阅读 · 2022年7月11日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

快来报名啦 | 图灵奖得主—— Joseph Sifakis明日重磅开讲

快来报名啦 | 图灵奖得主—— Joseph Sifakis明日重磅开讲

学术头条

0+阅读 · 2022年6月16日

【泡泡汇总】CVPR2019 SLAM Paperlist

【泡泡汇总】CVPR2019 SLAM Paperlist

泡泡机器人SLAM

14+阅读 · 2019年6月12日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

【论文推荐】最新六篇知识图谱相关论文—Zero-shot识别、卷积二维知识图谱、变分知识图谱推理、张量分解、推荐

专知

50+阅读 · 2018年4月25日

相关论文

TinyViT: Fast Pretraining Distillation for Small Vision Transformers

Arxiv

1+阅读 · 2022年7月21日

Sobolev Training for Implicit Neural Representations with Approximated Image Derivatives

Arxiv

0+阅读 · 2022年7月21日

Learning from Data with Noisy Labels Using Temporal Self-Ensemble

Arxiv

0+阅读 · 2022年7月21日

Pretraining a Neural Network before Knowing Its Architecture

Arxiv

0+阅读 · 2022年7月20日

Evaluating State of the Art, Forecasting Ensembles- and Meta-learning Strategies for Model Fusion

Arxiv

0+阅读 · 2022年7月19日

Adaptive Learning for the Resource-Constrained Classification Problem

Arxiv

0+阅读 · 2022年7月19日

SCARA: Scalable Graph Neural Networks with Feature-Oriented Optimization

Arxiv

0+阅读 · 2022年7月19日

Data-Free Knowledge Transfer: A Survey

Arxiv

21+阅读 · 2021年12月31日

Data-Free Knowledge Distillation for Heterogeneous Federated Learning

Arxiv

12+阅读 · 2021年6月9日

A Modern Introduction to Online Learning

A Modern Introduction to Online Learning

Arxiv

21+阅读 · 2019年12月31日

相关基金

MicroRNA调控DNA甲基化对先天性心脏病的发生和干预的功能研究

国家自然科学基金

0+阅读 · 2014年12月31日

脂肪细胞因子家族基因多态性与动脉粥样硬化性脑梗死的相关性研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于非欧式空间距离度量的地理加权回归分析模型研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于OCT-RNFL的MCI-AD个体化筛查模型的建立和验证

国家自然科学基金

0+阅读 · 2012年12月31日

磷酸化对聚腺苷酸专一性核糖核酸酶结构和功能的调控

国家自然科学基金

0+阅读 · 2012年12月31日

放疗中段功能分子影像指导非小细胞肺癌后程自适应放疗

国家自然科学基金

0+阅读 · 2011年12月31日

microRNA参与调控新生儿支气管肺发育不良的分子机制研究

国家自然科学基金

0+阅读 · 2010年12月31日

基于Compressive sensing理论的单探测器太赫兹成像技术

国家自然科学基金

0+阅读 · 2009年12月31日

miR-21和miR-34a在3,6-二羟黄酮抗乳腺癌发生中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

组合导航系统中基于混沌、小波和神经网络的信息融合方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员