具有创制模型的强力加强学习的复杂程度 (Sample Complexity of Robust Reinforcement Learning with a Generative Model) - 专知论文

会员服务 ·

0

样本复杂度 · 稳健性 · MoDELS · 散度 · 学成 ·

2021 年 12 月 3 日

Sample Complexity of Robust Reinforcement Learning with a Generative Model

翻译：具有创制模型的强力加强学习的复杂程度

Kishan Panaganti,Dileep Kalathil

from arxiv, 22 pages, 8 figures, under review

The Robust Markov Decision Process (RMDP) framework focuses on designing control policies that are robust against the parameter uncertainties due to the mismatches between the simulator model and real-world settings. An RMDP problem is typically formulated as a max-min problem, where the objective is to find the policy that maximizes the value function for the worst possible model that lies in an uncertainty set around a nominal model. The standard robust dynamic programming approach requires the knowledge of the nominal model for computing the optimal robust policy. In this work, we propose a model-based reinforcement learning (RL) algorithm for learning an $\epsilon$-optimal robust policy when the nominal model is unknown. We consider three different forms of uncertainty sets, characterized by the total variation distance, chi-square divergence, and KL divergence. For each of these uncertainty sets, we give a precise characterization of the sample complexity of our proposed algorithm. In addition to the sample complexity results, we also present a formal analytical argument on the benefit of using robust policies. Finally, we demonstrate the performance of our algorithm on two benchmark problems.

翻译：强健的马尔科夫决策程序(RMDP)框架侧重于设计针对因模拟模型与现实世界设置不匹配而导致的参数不确定性的稳健控制政策。 RMDP问题通常被表述为一个最大问题,目的是找到一种政策,使最差的模型的价值功能最大化,而最差的模型则是围绕一种名义模型设定的不确定性。标准强力动态方案编制方法要求了解计算最佳稳健政策的名义模型。在这项工作中,我们提出一种基于模型的强化学习算法(RL),用于在模范模型未知时学习$\epsilon$-最优的稳健政策。我们考虑了三种不同形式的不确定性组合,其特点是完全差异距离、奇夸差异和KL差异。我们对这些不确定性组合中的每一种组合都作了精确的定性。除了抽样复杂的结果外,我们还就使用强健健政策的好处提出了正式的分析论证。最后,我们展示了我们两个基准问题算法的绩效。

0

相关内容

样本复杂度

样本复杂度

【AAAI 2022】一种样本高效的基于模型的保守 actor-critic 算法

【AAAI 2022】一种样本高效的基于模型的保守 actor-critic 算法

专知会员服务

24+阅读 · 2022年1月10日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation

专知会员服务

17+阅读 · 2020年4月10日

MIT-深度学习Deep Learning State of the Art in 2020，87页ppt

MIT-深度学习Deep Learning State of the Art in 2020，87页ppt

专知会员服务

62+阅读 · 2020年2月17日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

【DeepMind-Nando de Freitas】强化学习教程，102页ppt，Reinforcement Learning

【DeepMind-Nando de Freitas】强化学习教程，102页ppt，Reinforcement Learning

专知会员服务

84+阅读 · 2019年11月15日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

最前沿：深度解读Soft Actor-Critic 算法

最前沿：深度解读Soft Actor-Critic 算法

极市平台

55+阅读 · 2019年7月28日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

论文浅尝 | Reinforcement Learning for Relation Classification

论文浅尝 | Reinforcement Learning for Relation Classification

开放知识图谱

9+阅读 · 2017年12月10日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Density Constrained Reinforcement Learning

Arxiv

6+阅读 · 2021年6月24日

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Arxiv

5+阅读 · 2021年6月11日

Model Complexity of Deep Learning: A Survey

Arxiv

32+阅读 · 2021年3月8日

Model-based Adversarial Meta-Reinforcement Learning

Arxiv

5+阅读 · 2020年6月16日

Risk-Aware Active Inverse Reinforcement Learning

Risk-Aware Active Inverse Reinforcement Learning

Arxiv

8+阅读 · 2019年1月8日

Reinforcement Learning with Perturbed Rewards

Arxiv

4+阅读 · 2018年10月5日

Paraphrase Generation with Deep Reinforcement Learning

Paraphrase Generation with Deep Reinforcement Learning

Arxiv

4+阅读 · 2018年8月23日

Logically-Constrained Reinforcement Learning

Arxiv

5+阅读 · 2018年4月22日

GraphRNN: A Deep Generative Model for Graphs

Arxiv

6+阅读 · 2018年2月24日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

VIP会员

文章信息

相关主题

样本复杂度

相关VIP内容

【AAAI 2022】一种样本高效的基于模型的保守 actor-critic 算法

【AAAI 2022】一种样本高效的基于模型的保守 actor-critic 算法

专知会员服务

24+阅读 · 2022年1月10日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation

专知会员服务

17+阅读 · 2020年4月10日

MIT-深度学习Deep Learning State of the Art in 2020，87页ppt

MIT-深度学习Deep Learning State of the Art in 2020，87页ppt

专知会员服务

62+阅读 · 2020年2月17日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

【DeepMind-Nando de Freitas】强化学习教程，102页ppt，Reinforcement Learning

【DeepMind-Nando de Freitas】强化学习教程，102页ppt，Reinforcement Learning

专知会员服务

84+阅读 · 2019年11月15日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

最前沿：深度解读Soft Actor-Critic 算法

最前沿：深度解读Soft Actor-Critic 算法

极市平台

55+阅读 · 2019年7月28日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

论文浅尝 | Reinforcement Learning for Relation Classification

论文浅尝 | Reinforcement Learning for Relation Classification

开放知识图谱

9+阅读 · 2017年12月10日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Density Constrained Reinforcement Learning

Arxiv

6+阅读 · 2021年6月24日

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Arxiv

5+阅读 · 2021年6月11日

Model Complexity of Deep Learning: A Survey

Arxiv

32+阅读 · 2021年3月8日

Model-based Adversarial Meta-Reinforcement Learning

Arxiv

5+阅读 · 2020年6月16日

Risk-Aware Active Inverse Reinforcement Learning

Risk-Aware Active Inverse Reinforcement Learning

Arxiv

8+阅读 · 2019年1月8日

Reinforcement Learning with Perturbed Rewards

Arxiv

4+阅读 · 2018年10月5日

Paraphrase Generation with Deep Reinforcement Learning

Paraphrase Generation with Deep Reinforcement Learning

Arxiv

4+阅读 · 2018年8月23日

Logically-Constrained Reinforcement Learning

Arxiv

5+阅读 · 2018年4月22日

GraphRNN: A Deep Generative Model for Graphs

Arxiv

6+阅读 · 2018年2月24日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

微信扫码咨询专知VIP会员