最大硬性RL(可(可(可))溶解方法某些强性RL问题 (Maximum Entropy RL (Provably) Solves Some Robust RL Problems) - 专知论文

会员服务 ·

0

稳健性 · 奖励函数 · 泛函 · 学成 · 知识 (knowledge) ·

2022 年 5 月 5 日

Maximum Entropy RL (Provably) Solves Some Robust RL Problems

翻译：最大硬性RL(可(可(可))溶解方法某些强性RL问题

Benjamin Eysenbach,Sergey Levine

from arxiv, Published at ICLR 2022. Blog post and videos: https://bair.berkeley.edu/blog/2021/03/10/maxent-robust-rl/. arXiv admin note: text overlap with arXiv:1910.01913

Many potential applications of reinforcement learning (RL) require guarantees that the agent will perform well in the face of disturbances to the dynamics or reward function. In this paper, we prove theoretically that maximum entropy (MaxEnt) RL maximizes a lower bound on a robust RL objective, and thus can be used to learn policies that are robust to some disturbances in the dynamics and the reward function. While this capability of MaxEnt RL has been observed empirically in prior work, to the best of our knowledge our work provides the first rigorous proof and theoretical characterization of the MaxEnt RL robust set. While a number of prior robust RL algorithms have been designed to handle similar disturbances to the reward function or dynamics, these methods typically require additional moving parts and hyperparameters on top of a base RL algorithm. In contrast, our results suggest that MaxEnt RL by itself is robust to certain disturbances, without requiring any additional modifications. While this does not imply that MaxEnt RL is the best available robust RL method, MaxEnt RL is a simple robust RL method with appealing formal guarantees.

翻译：强化学习的许多潜在应用(RL)要求保证代理商在遇到动态或奖励功能的干扰时能够很好地发挥作用。在本文中,我们从理论上证明,在强大的 RL 目标上,最大灵敏(MAxEnt) RL 将最大灵敏(MAxEnt) RL 限制最大化,从而可以用来学习对动态和奖励功能的某些干扰具有强力的政策。虽然在先前的工作中已经从经验上观察到了MaxEnt RL 的这种能力,但我们的工作为我们的知识提供了对MaxEnt RL 强力组合的首次严格证据和理论定性。虽然以前设计了一些强力RL 算法是为了处理与奖励功能或动态类似的扰动,但这些方法通常需要在基准RL 算法的顶部增加移动部件和超参数。相反,我们的结果表明,MaxEnt RL 本身对某些扰动具有强力,而不需要任何额外的修改。这并不意味着MaxEnt RL 是现有最强力RL 方法,但MaxEnt RL 是一种简单有力的RL 方法,可以上诉正式保证。

0

相关内容

稳健性

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

HFM1基因变异在卵巢早衰发病中的作用及致病机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于MAX-DOAS观测的北极地区对流层BrO的分布和变化机制探索

国家自然科学基金

0+阅读 · 2013年12月31日

血管紧张素II受体基因多态性与原发性醛固酮增多症发病风险、亚型及预后的相关性研究

国家自然科学基金

0+阅读 · 2012年12月31日

Euops属卷叶象甲与共生真菌的协同进化及化学机制

国家自然科学基金

0+阅读 · 2012年12月31日

Snai1/slug-miR30a反馈环路对肾小管上皮细胞间质转化的调控

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

行星际CME和CIR的磁流体动力学和遥感观测表现的对应关系

国家自然科学基金

0+阅读 · 2012年12月31日

Ｓlingshot-1L/LIM Kinase1信号网络逆转骨肉瘤转移及多药耐药的机制

国家自然科学基金

0+阅读 · 2011年12月31日

结直肠癌细胞外基质的动态变化特征及其对上皮间质转化的作用研究

国家自然科学基金

0+阅读 · 2011年12月31日

遍历哈密顿系统的谱理论

国家自然科学基金

0+阅读 · 2009年12月31日

Unified field theoretical approach to deep and recurrent neuronal networks

Arxiv

0+阅读 · 2022年6月24日

Direct Bézier-Based Trajectory Planner for Improved Local Exploration of Unknown Environments

Arxiv

0+阅读 · 2022年6月24日

AdAUC: End-to-end Adversarial AUC Optimization Against Long-tail Problems

Arxiv

0+阅读 · 2022年6月24日

Provably Efficient Model-Free Constrained RL with Linear Function Approximation

Arxiv

0+阅读 · 2022年6月23日

Quant-BnB: A Scalable Branch-and-Bound Method for Optimal Decision Trees with Continuous Features

Arxiv

0+阅读 · 2022年6月23日

Low-Rank Mirror-Prox for Nonsmooth and Low-Rank Matrix Optimization Problems

Arxiv

0+阅读 · 2022年6月23日

Large-scale Stochastic Optimization of NDCG Surrogates for Deep Learning with Provable Convergence

Arxiv

0+阅读 · 2022年6月22日

Auto-Encoding Adversarial Imitation Learning

Arxiv

0+阅读 · 2022年6月22日

Scaling and Scalability: Provable Nonconvex Low-Rank Tensor Estimation from Incomplete Measurements

Arxiv

0+阅读 · 2022年6月22日

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

Arxiv

33+阅读 · 2022年1月11日

VIP会员

文章信息

相关主题

知识 (knowledge)

相关VIP内容

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《人工智能绝不能完全自主》

《人工智能的法律与伦理：军事自主机器独特挑战的深度剖析》316页

从数据到主导：AI与兵棋推演构筑决策优势

《特洛伊木马货柜：武器化集装箱的战略威胁》最新报告

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Unified field theoretical approach to deep and recurrent neuronal networks

Arxiv

0+阅读 · 2022年6月24日

Direct Bézier-Based Trajectory Planner for Improved Local Exploration of Unknown Environments

Arxiv

0+阅读 · 2022年6月24日

AdAUC: End-to-end Adversarial AUC Optimization Against Long-tail Problems

Arxiv

0+阅读 · 2022年6月24日

Provably Efficient Model-Free Constrained RL with Linear Function Approximation

Arxiv

0+阅读 · 2022年6月23日

Quant-BnB: A Scalable Branch-and-Bound Method for Optimal Decision Trees with Continuous Features

Arxiv

0+阅读 · 2022年6月23日

Low-Rank Mirror-Prox for Nonsmooth and Low-Rank Matrix Optimization Problems

Arxiv

0+阅读 · 2022年6月23日

Large-scale Stochastic Optimization of NDCG Surrogates for Deep Learning with Provable Convergence

Arxiv

0+阅读 · 2022年6月22日

Auto-Encoding Adversarial Imitation Learning

Arxiv

0+阅读 · 2022年6月22日

Scaling and Scalability: Provable Nonconvex Low-Rank Tensor Estimation from Incomplete Measurements

Arxiv

0+阅读 · 2022年6月22日

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

Arxiv

33+阅读 · 2022年1月11日

相关基金

HFM1基因变异在卵巢早衰发病中的作用及致病机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于MAX-DOAS观测的北极地区对流层BrO的分布和变化机制探索

国家自然科学基金

0+阅读 · 2013年12月31日

血管紧张素II受体基因多态性与原发性醛固酮增多症发病风险、亚型及预后的相关性研究

国家自然科学基金

0+阅读 · 2012年12月31日

Euops属卷叶象甲与共生真菌的协同进化及化学机制

国家自然科学基金

0+阅读 · 2012年12月31日

Snai1/slug-miR30a反馈环路对肾小管上皮细胞间质转化的调控

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

行星际CME和CIR的磁流体动力学和遥感观测表现的对应关系

国家自然科学基金

0+阅读 · 2012年12月31日

Ｓlingshot-1L/LIM Kinase1信号网络逆转骨肉瘤转移及多药耐药的机制

国家自然科学基金

0+阅读 · 2011年12月31日

结直肠癌细胞外基质的动态变化特征及其对上皮间质转化的作用研究

国家自然科学基金

0+阅读 · 2011年12月31日

遍历哈密顿系统的谱理论

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员