简单分散式跨 Enstropy 方法 (A Simple Decentralized Cross-Entropy Method) - 专知论文

会员服务 ·

0

SimPLe · 样本 · Analysis · Continuity · 相互独立的 ·

2022 年 12 月 16 日

A Simple Decentralized Cross-Entropy Method

翻译：简单分散式跨 Enstropy 方法

Zichen Zhang,Jun Jin,Martin Jagersand,Jun Luo,Dale Schuurmans

from arxiv, NeurIPS 2022. The last two authors advised equally

Cross-Entropy Method (CEM) is commonly used for planning in model-based reinforcement learning (MBRL) where a centralized approach is typically utilized to update the sampling distribution based on only the top-$k$ operation's results on samples. In this paper, we show that such a centralized approach makes CEM vulnerable to local optima, thus impairing its sample efficiency. To tackle this issue, we propose Decentralized CEM (DecentCEM), a simple but effective improvement over classical CEM, by using an ensemble of CEM instances running independently from one another, and each performing a local improvement of its own sampling distribution. We provide both theoretical and empirical analysis to demonstrate the effectiveness of this simple decentralized approach. We empirically show that, compared to the classical centralized approach using either a single or even a mixture of Gaussian distributions, our DecentCEM finds the global optimum much more consistently thus improves the sample efficiency. Furthermore, we plug in our DecentCEM in the planning problem of MBRL, and evaluate our approach in several continuous control environments, with comparison to the state-of-art CEM based MBRL approaches (PETS and POPLIN). Results show sample efficiency improvement by simply replacing the classical CEM module with our DecentCEM module, while only sacrificing a reasonable amount of computational cost. Lastly, we conduct ablation studies for more in-depth analysis. Code is available at https://github.com/vincentzhang/decentCEM

翻译：在模型强化学习(MBRL)中,通常使用集中方法更新抽样分布,仅以最高-美元业务的抽样结果为基础。在本文中,我们表明,这种集中方法使CEM易受当地选取的伤害,从而损害其抽样效率。为解决这一问题,我们建议分散使用CEM(DecentCEM),这是对传统的CEM(CEM)的一种简单而有效的改进,方法是使用一个相互独立运行的CEM实例组合,每个案例都对其本身的抽样分布进行局部改进。我们提供理论和经验分析,以证明这种简单的分散方法的有效性。我们从经验上表明,与传统的集中方法相比,我们使用单一或甚至混合的高山分配方法,使我们的RimCEM(C)发现全球最佳得多,这样可以提高取样效率。此外,我们把我们的RimCEMEM(RBR) 用于MBRL/C(MBRC) 的深度规划问题中,并评估我们在若干连续的控制环境中的做法,与基于CMEMC(C) 的状态的改进成本,同时用Slimal EM(MEM) IML) 的模型的升级分析显示我们的现有成本标准,只有标准的升级分析。

0

相关内容

SimPLe

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

同型半胱氨酸经ERK通路上调ETB受体表达促血管平滑肌细胞增殖机制

国家自然科学基金

0+阅读 · 2015年12月31日

Copine VII在阿尔茨海默病中的作用机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Decorin对急性缺血性卒中后血脑屏障中ZO-1蛋白的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

高糖影响肺动脉平滑肌细胞收缩增殖的作用及机制

国家自然科学基金

0+阅读 · 2015年12月31日

基于Rho/Rho激酶通路研究针刺干预自发性高血压大鼠血管重塑平滑肌细胞增殖、迁移的作用及机制

国家自然科学基金

0+阅读 · 2015年12月31日

LOC283683-NIPA1-BMPRII途径对胆固醇平衡和动脉粥样硬化的影响及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

pH响应γ-聚谷氨酸靶向纳米载药体系用于增强抗肿瘤效应及逆转乳腺癌多药耐药的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

外周血T淋巴细胞钙敏感受体在急性心肌梗死中的作用及其机制

国家自然科学基金

0+阅读 · 2012年12月31日

miR-140在肿瘤转移中的作用及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

TRPC6在VEGF调节新生血管形成中的作用及机制

国家自然科学基金

0+阅读 · 2008年12月31日

Loss of Distributed Coverage Using Lazy Agents Operating Under Discrete, Local, Event-Triggered Communication

Arxiv

0+阅读 · 2023年2月17日

Online Spatio-Temporal Correlation-Based Federated Learning for Traffic Flow Forecasting

Arxiv

0+阅读 · 2023年2月17日

Unsupervised Evaluation of Out-of-distribution Detection: A Data-centric Perspective

Arxiv

0+阅读 · 2023年2月16日

Aligning Language Models with Preferences through f-divergence Minimization

Arxiv

1+阅读 · 2023年2月16日

Model-Based Decentralized Policy Optimization

Arxiv

0+阅读 · 2023年2月16日

Trust Region Bounds for Decentralized PPO Under Non-stationarity

Arxiv

0+阅读 · 2023年2月15日

Frustratingly Simple but Effective Zero-shot Detection and Segmentation: Analysis and a Strong Baseline

Arxiv

0+阅读 · 2023年2月14日

DSE Stock Price Prediction using Hidden Markov Model

Arxiv

0+阅读 · 2023年1月26日

Self-Supervised Learning via Maximum Entropy Coding

Arxiv

13+阅读 · 2022年10月20日

Relational Learning with Gated and Attentive Neighbor Aggregator for Few-Shot Knowledge Graph Completion

Arxiv

12+阅读 · 2021年4月27日

VIP会员

文章信息

相关主题

相互独立的

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Loss of Distributed Coverage Using Lazy Agents Operating Under Discrete, Local, Event-Triggered Communication

Arxiv

0+阅读 · 2023年2月17日

Online Spatio-Temporal Correlation-Based Federated Learning for Traffic Flow Forecasting

Arxiv

0+阅读 · 2023年2月17日

Unsupervised Evaluation of Out-of-distribution Detection: A Data-centric Perspective

Arxiv

0+阅读 · 2023年2月16日

Aligning Language Models with Preferences through f-divergence Minimization

Arxiv

1+阅读 · 2023年2月16日

Model-Based Decentralized Policy Optimization

Arxiv

0+阅读 · 2023年2月16日

Trust Region Bounds for Decentralized PPO Under Non-stationarity

Arxiv

0+阅读 · 2023年2月15日

Frustratingly Simple but Effective Zero-shot Detection and Segmentation: Analysis and a Strong Baseline

Arxiv

0+阅读 · 2023年2月14日

DSE Stock Price Prediction using Hidden Markov Model

Arxiv

0+阅读 · 2023年1月26日

Self-Supervised Learning via Maximum Entropy Coding

Arxiv

13+阅读 · 2022年10月20日

Relational Learning with Gated and Attentive Neighbor Aggregator for Few-Shot Knowledge Graph Completion

Arxiv

12+阅读 · 2021年4月27日

相关基金

同型半胱氨酸经ERK通路上调ETB受体表达促血管平滑肌细胞增殖机制

国家自然科学基金

0+阅读 · 2015年12月31日

Copine VII在阿尔茨海默病中的作用机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Decorin对急性缺血性卒中后血脑屏障中ZO-1蛋白的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

高糖影响肺动脉平滑肌细胞收缩增殖的作用及机制

国家自然科学基金

0+阅读 · 2015年12月31日

基于Rho/Rho激酶通路研究针刺干预自发性高血压大鼠血管重塑平滑肌细胞增殖、迁移的作用及机制

国家自然科学基金

0+阅读 · 2015年12月31日

LOC283683-NIPA1-BMPRII途径对胆固醇平衡和动脉粥样硬化的影响及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

pH响应γ-聚谷氨酸靶向纳米载药体系用于增强抗肿瘤效应及逆转乳腺癌多药耐药的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

外周血T淋巴细胞钙敏感受体在急性心肌梗死中的作用及其机制

国家自然科学基金

0+阅读 · 2012年12月31日

miR-140在肿瘤转移中的作用及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

TRPC6在VEGF调节新生血管形成中的作用及机制

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员