规范化的马克夫决策程序:强力与规范化之间的等同关系</s> (Twice Regularized Markov Decision Processes: The Equivalence between Robustness and Regularization) - 专知论文

会员服务 ·

0

正则化项 · 稳健性 · Markov · 正则化 · Learning ·

2023 年 3 月 12 日

Twice Regularized Markov Decision Processes: The Equivalence between Robustness and Regularization

翻译：规范化的马克夫决策程序:强力与规范化之间的等同关系

Esther Derman,Yevgeniy Men,Matthieu Geist,Shie Mannor

from arxiv, Extended version of NeuIPS paper: arXiv:2110.06267

Robust Markov decision processes (MDPs) aim to handle changing or partially known system dynamics. To solve them, one typically resorts to robust optimization methods. However, this significantly increases computational complexity and limits scalability in both learning and planning. On the other hand, regularized MDPs show more stability in policy learning without impairing time complexity. Yet, they generally do not encompass uncertainty in the model dynamics. In this work, we aim to learn robust MDPs using regularization. We first show that regularized MDPs are a particular instance of robust MDPs with uncertain reward. We thus establish that policy iteration on reward-robust MDPs can have the same time complexity as on regularized MDPs. We further extend this relationship to MDPs with uncertain transitions: this leads to a regularization term with an additional dependence on the value function. We then generalize regularized MDPs to twice regularized MDPs ($\text{R}^2$ MDPs), i.e., MDPs with $\textit{both}$ value and policy regularization. The corresponding Bellman operators enable us to derive planning and learning schemes with convergence and generalization guarantees, thus reducing robustness to regularization. We numerically show this two-fold advantage on tabular and physical domains, highlighting the fact that $\text{R}^2$ preserves its efficacy in continuous environments.

翻译：Robust Markov 决策程序(MDPs) 旨在处理不断变化的或部分已知的系统动态。要解决这些问题, 通常要采用稳健的优化方法。但是, 这会大大增加计算复杂性, 并限制学习和规划两方面的可扩展性。另一方面, 正规化的 MDPs 显示政策学习更加稳定, 而不会影响时间复杂性。但是, 它们通常并不包含模型动态的不确定性。在这项工作中, 我们的目标是利用正规化来学习强健的 MDPs ($\ text{R<unk> 2$ MDPs) 。我们首先显示, 正规化的 MDPs 是强健健的MDPs 的特例, 有不确定的奖赏。我们由此确定, 奖赏- robust MDPs 的政策转换可以与正规化的 MDPs 具有相同的时间复杂性。我们进一步将这一关系扩大到不确定的 MDPs 正规化期, 并更加依赖价值。然后我们把正规化的 MDPs 推广到两倍的 MDPs ($\ textitriet{boltital} 值) 价值和政策规范。我们对应的 Bellmanman 操作者能够展示和持续的常规化的系统。</s>

0

相关内容

正则化项

【决策Transformers 导论】Introducing Decision Transformers on Hugging Face 🤗

【决策Transformers 导论】Introducing Decision Transformers on Hugging Face 🤗

专知会员服务

67+阅读 · 2022年3月29日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

专知会员服务

102+阅读 · 2020年6月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

基于约束等距条件的噪音低秩矩阵恢复算法研究

国家自然科学基金

1+阅读 · 2015年12月31日

化学图的谱及相关性质

国家自然科学基金

0+阅读 · 2015年12月31日

隐度条件下图的哈密尔顿圈

国家自然科学基金

0+阅读 · 2014年12月31日

Fe3O4/SiO2/MOF磁性多孔材料的构筑及对酚类分子的吸附机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

单分子乃至亚分子尺度的量子态研究

国家自然科学基金

0+阅读 · 2013年12月31日

非一致指数二分与伪轨跟踪

国家自然科学基金

0+阅读 · 2013年12月31日

锂空电池钙钛矿型镧锶钴氧分级介孔纳米线电催化性能与机理

国家自然科学基金

0+阅读 · 2012年12月31日

图的规范拉普拉斯谱

国家自然科学基金

1+阅读 · 2012年12月31日

高分子/高分子界面扩散行为的原子力显微镜研究

国家自然科学基金

0+阅读 · 2009年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

Local asymptotic equivalence of pure quantum states ensembles and quantum Gaussian white noise

Arxiv

0+阅读 · 2023年5月4日

Learning How to Infer Partial MDPs for In-Context Adaptation and Exploration

Arxiv

0+阅读 · 2023年5月4日

Expertise Trees Resolve Knowledge Limitations in Collective Decision-Making

Arxiv

0+阅读 · 2023年5月4日

Joint Graph Learning and Model Fitting in Laplacian Regularized Stratified Models

Arxiv

0+阅读 · 2023年5月4日

Efficient Online Decision Tree Learning with Active Feature Acquisition

Arxiv

0+阅读 · 2023年5月3日

A survey of modularized backstepping control design approaches to nonlinear ODE systems

Arxiv

0+阅读 · 2023年5月3日

Semi-Parametric Identification and Estimation of Interaction and Effect Modification in Mixed Exposures using Stochastic Interventions

Arxiv

0+阅读 · 2023年5月3日

Software Runtime Monitoring with Adaptive Sampling Rate to Collect Representative Samples of Execution Traces

Arxiv

0+阅读 · 2023年5月1日

The Conflict Between Explainable and Accountable Decision-Making Algorithms

Arxiv

31+阅读 · 2022年5月11日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

VIP会员

文章信息

相关主题

相关VIP内容

【决策Transformers 导论】Introducing Decision Transformers on Hugging Face 🤗

【决策Transformers 导论】Introducing Decision Transformers on Hugging Face 🤗

专知会员服务

67+阅读 · 2022年3月29日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

专知会员服务

102+阅读 · 2020年6月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】以人为中心的强化学习

任务规划与地形分析：现代复杂环境作战导航体系

认知优势：人工智能在国家安全决策中的核心作用

大模型赋能的具身智能：决策与具身学习综述

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Local asymptotic equivalence of pure quantum states ensembles and quantum Gaussian white noise

Arxiv

0+阅读 · 2023年5月4日

Learning How to Infer Partial MDPs for In-Context Adaptation and Exploration

Arxiv

0+阅读 · 2023年5月4日

Expertise Trees Resolve Knowledge Limitations in Collective Decision-Making

Arxiv

0+阅读 · 2023年5月4日

Joint Graph Learning and Model Fitting in Laplacian Regularized Stratified Models

Arxiv

0+阅读 · 2023年5月4日

Efficient Online Decision Tree Learning with Active Feature Acquisition

Arxiv

0+阅读 · 2023年5月3日

A survey of modularized backstepping control design approaches to nonlinear ODE systems

Arxiv

0+阅读 · 2023年5月3日

Semi-Parametric Identification and Estimation of Interaction and Effect Modification in Mixed Exposures using Stochastic Interventions

Arxiv

0+阅读 · 2023年5月3日

Software Runtime Monitoring with Adaptive Sampling Rate to Collect Representative Samples of Execution Traces

Arxiv

0+阅读 · 2023年5月1日

The Conflict Between Explainable and Accountable Decision-Making Algorithms

Arxiv

31+阅读 · 2022年5月11日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

相关基金

基于约束等距条件的噪音低秩矩阵恢复算法研究

国家自然科学基金

1+阅读 · 2015年12月31日

化学图的谱及相关性质

国家自然科学基金

0+阅读 · 2015年12月31日

隐度条件下图的哈密尔顿圈

国家自然科学基金

0+阅读 · 2014年12月31日

Fe3O4/SiO2/MOF磁性多孔材料的构筑及对酚类分子的吸附机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

单分子乃至亚分子尺度的量子态研究

国家自然科学基金

0+阅读 · 2013年12月31日

非一致指数二分与伪轨跟踪

国家自然科学基金

0+阅读 · 2013年12月31日

锂空电池钙钛矿型镧锶钴氧分级介孔纳米线电催化性能与机理

国家自然科学基金

0+阅读 · 2012年12月31日

图的规范拉普拉斯谱

国家自然科学基金

1+阅读 · 2012年12月31日

高分子/高分子界面扩散行为的原子力显微镜研究

国家自然科学基金

0+阅读 · 2009年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员