高效解决矩形马可夫决定程序 (An Efficient Solution to s-Rectangular Robust Markov Decision Processes) - 专知论文

会员服务 ·

0

稳健性 · Markov · Processing（编程语言） · 优化器 · 值迭代 ·

2023 年 1 月 31 日

An Efficient Solution to s-Rectangular Robust Markov Decision Processes

翻译：高效解决矩形马可夫决定程序

Navdeep Kumar,Kfir Levy,Kaixin Wang,Shie Mannor

from arxiv, arXiv admin note: substantial text overlap with arXiv:2205.14327

We present an efficient robust value iteration for \texttt{s}-rectangular robust Markov Decision Processes (MDPs) with a time complexity comparable to standard (non-robust) MDPs which is significantly faster than any existing method. We do so by deriving the optimal robust Bellman operator in concrete forms using our $L_p$ water filling lemma. We unveil the exact form of the optimal policies, which turn out to be novel threshold policies with the probability of playing an action proportional to its advantage.

翻译：我们为\ texttt{s}-rectagon 稳健的Markov 决策程序(MDPs)展示了高效的稳健值迭代,其时间复杂性可与标准(非紫外线) MDPs相比,大大快于任何现有方法。我们这样做的方式是,利用我们的$L_p$p$的填水薄膜,以具体的形式从最佳的Bellman操作员中推导出最佳的Bellman操作员。我们展示了最佳政策的确切形式,这最终成为新的门槛政策,有可能采取与其优势相称的行动。

0

相关内容

稳健性

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【脑机接口教程】Machine Learning for BCI，NeurotechEDU

【脑机接口教程】Machine Learning for BCI，NeurotechEDU

专知会员服务

35+阅读 · 2022年2月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

随机动态系统的风险分析及其最优控制问题

国家自然科学基金

1+阅读 · 2014年12月31日

强流质子束与固体靶相互作用的数值模拟研究

国家自然科学基金

0+阅读 · 2013年12月31日

Massive MIMO系统关键技术的研究

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

早期复极人群心源性猝死的机制与危险分层

国家自然科学基金

0+阅读 · 2012年12月31日

轴对称的Navier-Stokes方程

国家自然科学基金

1+阅读 · 2011年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

环境诱导家蚕滞育的CREB调控机制

国家自然科学基金

0+阅读 · 2011年12月31日

超音速非平衡等离子体消除挥发性有机物的基础研究

国家自然科学基金

0+阅读 · 2009年12月31日

超高速碰撞产生等离子体的电磁特性及物理机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

Strategy Synthesis in Markov Decision Processes Under Limited Sampling Access

Arxiv

0+阅读 · 2023年3月22日

Inexact iterative numerical linear algebra for neural network-based spectral estimation and rare-event prediction

Arxiv

0+阅读 · 2023年3月22日

Robust, strong form mechanics on an adaptive structured grid: efficiently solving variable-geometry near-singular problems with diffuse interfaces

Arxiv

0+阅读 · 2023年3月22日

Policy Mirror Descent Inherently Explores Action Space

Arxiv

0+阅读 · 2023年3月21日

Posterior Representations for Bayesian Context Trees: Sampling, Estimation and Convergence

Arxiv

0+阅读 · 2023年3月20日

High-Dimensional Dynamic Pricing under Non-Stationarity: Learning and Earning with Change-Point Detection

Arxiv

0+阅读 · 2023年3月20日

Dictionary-based model reduction for state estimation

Arxiv

0+阅读 · 2023年3月19日

An Efficient Randomized Fixed-Precision Algorithm for Tensor Singular Value Decomposition

Arxiv

0+阅读 · 2023年3月18日

Residual Physics Learning and System Identification for Sim-to-real Transfer of Policies on Buoyancy Assisted Legged Robots

Arxiv

0+阅读 · 2023年3月16日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Arxiv

17+阅读 · 2019年9月9日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【脑机接口教程】Machine Learning for BCI，NeurotechEDU

【脑机接口教程】Machine Learning for BCI，NeurotechEDU

专知会员服务

35+阅读 · 2022年2月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《小型无人机系统侦测追踪技术：声学、计算机视觉与深度学习融合方案》最新98页

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

光纤无人机：反无人机系统的重大挑战

《作战建模与仿真实证研究》

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Strategy Synthesis in Markov Decision Processes Under Limited Sampling Access

Arxiv

0+阅读 · 2023年3月22日

Inexact iterative numerical linear algebra for neural network-based spectral estimation and rare-event prediction

Arxiv

0+阅读 · 2023年3月22日

Robust, strong form mechanics on an adaptive structured grid: efficiently solving variable-geometry near-singular problems with diffuse interfaces

Arxiv

0+阅读 · 2023年3月22日

Policy Mirror Descent Inherently Explores Action Space

Arxiv

0+阅读 · 2023年3月21日

Posterior Representations for Bayesian Context Trees: Sampling, Estimation and Convergence

Arxiv

0+阅读 · 2023年3月20日

High-Dimensional Dynamic Pricing under Non-Stationarity: Learning and Earning with Change-Point Detection

Arxiv

0+阅读 · 2023年3月20日

Dictionary-based model reduction for state estimation

Arxiv

0+阅读 · 2023年3月19日

An Efficient Randomized Fixed-Precision Algorithm for Tensor Singular Value Decomposition

Arxiv

0+阅读 · 2023年3月18日

Residual Physics Learning and System Identification for Sim-to-real Transfer of Policies on Buoyancy Assisted Legged Robots

Arxiv

0+阅读 · 2023年3月16日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Arxiv

17+阅读 · 2019年9月9日

相关基金

随机动态系统的风险分析及其最优控制问题

国家自然科学基金

1+阅读 · 2014年12月31日

强流质子束与固体靶相互作用的数值模拟研究

国家自然科学基金

0+阅读 · 2013年12月31日

Massive MIMO系统关键技术的研究

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

早期复极人群心源性猝死的机制与危险分层

国家自然科学基金

0+阅读 · 2012年12月31日

轴对称的Navier-Stokes方程

国家自然科学基金

1+阅读 · 2011年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

环境诱导家蚕滞育的CREB调控机制

国家自然科学基金

0+阅读 · 2011年12月31日

超音速非平衡等离子体消除挥发性有机物的基础研究

国家自然科学基金

0+阅读 · 2009年12月31日

超高速碰撞产生等离子体的电磁特性及物理机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员