使用强力 MDP 优化百分比标准 (Optimizing Percentile Criterion Using Robust MDPs) - 专知论文

会员服务 ·

0

优化器 · 稳健性 · Weight · 频率主义学派 · 准则 ·

2021 年 2 月 25 日

Optimizing Percentile Criterion Using Robust MDPs

翻译：使用强力 MDP 优化百分比标准

Bahram Behzadian,Reazul Hasan Russel,Marek Petrik,Chin Pang Ho

We address the problem of computing reliable policies in reinforcement learning problems with limited data. In particular, we compute policies that achieve good returns with high confidence when deployed. This objective, known as the \emph{percentile criterion}, can be optimized using Robust MDPs~(RMDPs). RMDPs generalize MDPs to allow for uncertain transition probabilities chosen adversarially from given ambiguity sets. We show that the RMDP solution's sub-optimality depends on the spans of the ambiguity sets along the value function. We then propose new algorithms that minimize the span of ambiguity sets defined by weighted $L_1$ and $L_\infty$ norms. Our primary focus is on Bayesian guarantees, but we also describe how our methods apply to frequentist guarantees and derive new concentration inequalities for weighted $L_1$ and $L_\infty$ norms. Experimental results indicate that our optimized ambiguity sets improve significantly on prior construction methods.

翻译：我们用有限的数据来解决计算可靠的政策,用有限的数据来强化学习问题。特别是, 我们计算出在部署时以高度自信实现良好回报的政策。这个称为 emph{ 百分度标准的目标, 可以用强力 MDPs~( RMDPs) 优化。 RMDPs 将 MDPs 普遍化, 以便从给定的模棱两可的模数组中选择不确定的过渡概率。我们显示 RMDP 解决方案的亚优度取决于值函数的模糊度。然后我们提出新的算法, 最大限度地减少由加权的 $_1 和 $ $ 和 $ infty 标准定义的模糊度。我们的主要重点是巴伊斯保证, 但我们也描述了我们的方法如何适用于常态的保证, 并产生加权的 $1 和 $ L ⁇ inty 规范的新的浓度不平等。我们的实验结果表明, 我们的优化模糊性将大大改进以前的建筑方法。

0

相关内容

优化器

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

122+阅读 · 2020年5月30日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

On the implied weights of linear regression for causal inference

Arxiv

0+阅读 · 2021年4月19日

Multigraded Sylvester forms, Duality and Elimination Matrices

Arxiv

0+阅读 · 2021年4月18日

Soft and subspace robust multivariate rank tests based on entropy regularized optimal transport

Arxiv

0+阅读 · 2021年4月17日

Solving Bang-Bang Problems Using The Immersed Interface Method and Integer Programming

Arxiv

0+阅读 · 2021年4月17日

Distributed Value Function Approximation for Collaborative Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2021年4月17日

Adaptive Robust Model Predictive Control with Matched and Unmatched Uncertainty

Arxiv

0+阅读 · 2021年4月16日

Safe Exploration in Model-based Reinforcement Learning using Control Barrier Functions

Safe Exploration in Model-based Reinforcement Learning using Control Barrier Functions

Arxiv

0+阅读 · 2021年4月16日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Accelerated Randomized Coordinate Descent Algorithms for Stochastic Optimization and Online Learning

Arxiv

9+阅读 · 2018年7月16日

Parameter Space Noise for Exploration

Arxiv

3+阅读 · 2018年1月31日

VIP会员

文章信息

相关主题

频率主义学派

相关VIP内容

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

122+阅读 · 2020年5月30日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《多智能体不确定环境追逃博弈研究》216页

美智库最新发布《解放军"人机编组协同作战"发展路径：理论与实践》53页

现代战争"杀伤区"理论：空间尺度与结构特征、控制手段与毁伤机制、生存策略与战线转移

《俄军无人机创新技术或已在乌克兰达成"战场空中封锁"作战效果》最新18页报告

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

On the implied weights of linear regression for causal inference

Arxiv

0+阅读 · 2021年4月19日

Multigraded Sylvester forms, Duality and Elimination Matrices

Arxiv

0+阅读 · 2021年4月18日

Soft and subspace robust multivariate rank tests based on entropy regularized optimal transport

Arxiv

0+阅读 · 2021年4月17日

Solving Bang-Bang Problems Using The Immersed Interface Method and Integer Programming

Arxiv

0+阅读 · 2021年4月17日

Distributed Value Function Approximation for Collaborative Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2021年4月17日

Adaptive Robust Model Predictive Control with Matched and Unmatched Uncertainty

Arxiv

0+阅读 · 2021年4月16日

Safe Exploration in Model-based Reinforcement Learning using Control Barrier Functions

Safe Exploration in Model-based Reinforcement Learning using Control Barrier Functions

Arxiv

0+阅读 · 2021年4月16日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Accelerated Randomized Coordinate Descent Algorithms for Stochastic Optimization and Online Learning

Arxiv

9+阅读 · 2018年7月16日

Parameter Space Noise for Exploration

Arxiv

3+阅读 · 2018年1月31日

微信扫码咨询专知VIP会员