翻译后的标题： (A Fully Polynomial Time Approximation Scheme for Constrained MDPs and Stochastic Shortest Path under Local Transitions) - 专知论文

会员服务 ·

0

约束 · 随机最短路径 · 多项式时间 · 随机环境 · 确定性策略 ·

2023 年 4 月 18 日

A Fully Polynomial Time Approximation Scheme for Constrained MDPs and Stochastic Shortest Path under Local Transitions

翻译：翻译后的标题：

The fixed-horizon constrained Markov Decision Process (C-MDP) is a well-known model for planning in stochastic environments under operating constraints. Chance-Constrained MDP (CC-MDP) is a variant that allows bounding the probability of constraint violation, which is desired in many safety-critical applications. CC-MDP can also model a class of MDPs, called Stochastic Shortest Path (SSP), under dead-ends, where there is a trade-off between the probability-to-goal and cost-to-goal. This work studies the structure of (C)C-MDP, particularly an important variant that involves local transition. In this variant, the state reachability exhibits a certain degree of locality and independence from the remaining states. More precisely, the number of states, at a given time, that share some reachable future states is always constant. (C)C-MDP under local transition is NP-Hard even for a planning horizon of two. In this work, we propose a fully polynomial-time approximation scheme for (C)C-MDP that computes (near) optimal deterministic policies. Such an algorithm is among the best approximation algorithm attainable in theory and gives insights into the approximability of constrained MDP and its variants.

翻译：一个在局部转移下对约束MDP和随机最短路径进行全多项式时间逼近的方案翻译后的摘要：固定时序的约束马尔可夫决策过程（C-MDP）是在操作约束下规划随机环境中的一种通用模型。几率约束MDP（CC-MDP）是其变体，允许限定约束违反的可能性，这在许多安全关键型应用程序中是必要的。CC-MDP还可以对一类称为具有死路的随机最短路径（SSP）的MDP建模，其中存在目标达成的概率和代价之间的权衡。本文研究（C）C-MDP的结构，特别是一种涉及局部转移的重要变体。在这个变体中，状态可达性表现出一定程度的局部性和对其他状态的独立性。更具体地说，给定时间的状态数量，其可达未来状态是固定的。即使在规划时间范围为2时，局部转移的（C）C-MDP也是NP难题。在本文中，我们提出了一个全多项式时间逼近的（C）C-MDP方案，它可以计算（接近）最优的确定性策略。这种算法在理论上是最好的逼近算法之一，并为约束MDP及其变体的逼近性提供了启示。

0

相关内容

【ICDM2022教程】多目标优化与推荐，173页ppt

【ICDM2022教程】多目标优化与推荐，173页ppt

专知会员服务

46+阅读 · 2022年12月24日

神经网络如何推理算法？DeepMind Petar等LoG 2022 《神经算法推理》教程，系统性讲解神经网络与经典算法结合

神经网络如何推理算法？DeepMind Petar等LoG 2022 《神经算法推理》教程，系统性讲解神经网络与经典算法结合

专知会员服务

31+阅读 · 2022年12月22日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

图挖掘与多关系学习，亚马逊与CMU-WWW2021教程，附161页ppt

专知会员服务

37+阅读 · 2021年4月20日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

【变分推断课件】Lectures on Variational Inference：Statistical Analysis of Variational Approximations（附带pdf）

【变分推断课件】Lectures on Variational Inference：Statistical Analysis of Variational Approximations（附带pdf）

专知会员服务

16+阅读 · 2019年11月30日

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

专知会员服务

18+阅读 · 2019年11月1日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

基于几类订货费用的连续盘点随机库存优化问题

国家自然科学基金

0+阅读 · 2013年12月31日

随机矩阵理论中Beta系综的特征多项式

国家自然科学基金

0+阅读 · 2013年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

参数复杂性、SAT求解器和树宽度

国家自然科学基金

0+阅读 · 2012年12月31日

多通道SAR地面运动目标自动检测与定位技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

多孔介质中的Brinkman-Forchheimer方程解的稳定性研究

国家自然科学基金

0+阅读 · 2011年12月31日

集成式E/E架构下汽车功能安全建模与多目标优化分析方法

国家自然科学基金

0+阅读 · 2011年12月31日

风/光互补复合能源发电场能量管理策略研究

国家自然科学基金

0+阅读 · 2011年12月31日

区间不确定需求下的混合公路交通网络设计

国家自然科学基金

0+阅读 · 2011年12月31日

Unscented卡尔曼滤波算法及其在通信中的应用

国家自然科学基金

0+阅读 · 2008年12月31日

Convex and Non-Convex Optimization under Generalized Smoothness

Arxiv

0+阅读 · 2023年6月2日

Refined Regret for Adversarial MDPs with Linear Function Approximation

Arxiv

0+阅读 · 2023年6月1日

Non-stationary Reinforcement Learning under General Function Approximation

Arxiv

0+阅读 · 2023年6月1日

Provably Learning Diverse Features in Multi-View Data with Midpoint Mixup

Arxiv

0+阅读 · 2023年6月1日

Identifiability and Generalizability in Constrained Inverse Reinforcement Learning

Arxiv

2+阅读 · 2023年6月1日

Time and Space Optimal Massively Parallel Algorithm for the 2-Ruling Set Problem

Arxiv

0+阅读 · 2023年6月1日

Near-optimal learning with average Hölder smoothness

Arxiv

0+阅读 · 2023年6月1日

Reliability analysis of arbitrary systems based on active learning and global sensitivity analysis

Arxiv

0+阅读 · 2023年5月31日

Discretization and Optimization using Graphs: One-Dimensional Algorithm

Arxiv

0+阅读 · 2023年5月30日

Deep Learning for Weakly-Supervised Object Detection and Object Localization: A Survey

Deep Learning for Weakly-Supervised Object Detection and Object Localization: A Survey

Arxiv

16+阅读 · 2021年5月26日

VIP会员

文章信息

相关主题

随机最短路径

多项式时间

确定性策略

相关VIP内容

【ICDM2022教程】多目标优化与推荐，173页ppt

【ICDM2022教程】多目标优化与推荐，173页ppt

专知会员服务

46+阅读 · 2022年12月24日

神经网络如何推理算法？DeepMind Petar等LoG 2022 《神经算法推理》教程，系统性讲解神经网络与经典算法结合

神经网络如何推理算法？DeepMind Petar等LoG 2022 《神经算法推理》教程，系统性讲解神经网络与经典算法结合

专知会员服务

31+阅读 · 2022年12月22日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

图挖掘与多关系学习，亚马逊与CMU-WWW2021教程，附161页ppt

专知会员服务

37+阅读 · 2021年4月20日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

【变分推断课件】Lectures on Variational Inference：Statistical Analysis of Variational Approximations（附带pdf）

【变分推断课件】Lectures on Variational Inference：Statistical Analysis of Variational Approximations（附带pdf）

专知会员服务

16+阅读 · 2019年11月30日

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

专知会员服务

18+阅读 · 2019年11月1日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【AAAI2026】NeSTR：一种用于大型语言模型的神经-符号可溯因框架，用于时间推理

深度强化学习与模仿学习导论

智能体适应

【博士论文】面向开放式世界的鲁棒智能体

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Convex and Non-Convex Optimization under Generalized Smoothness

Arxiv

0+阅读 · 2023年6月2日

Refined Regret for Adversarial MDPs with Linear Function Approximation

Arxiv

0+阅读 · 2023年6月1日

Non-stationary Reinforcement Learning under General Function Approximation

Arxiv

0+阅读 · 2023年6月1日

Provably Learning Diverse Features in Multi-View Data with Midpoint Mixup

Arxiv

0+阅读 · 2023年6月1日

Identifiability and Generalizability in Constrained Inverse Reinforcement Learning

Arxiv

2+阅读 · 2023年6月1日

Time and Space Optimal Massively Parallel Algorithm for the 2-Ruling Set Problem

Arxiv

0+阅读 · 2023年6月1日

Near-optimal learning with average Hölder smoothness

Arxiv

0+阅读 · 2023年6月1日

Reliability analysis of arbitrary systems based on active learning and global sensitivity analysis

Arxiv

0+阅读 · 2023年5月31日

Discretization and Optimization using Graphs: One-Dimensional Algorithm

Arxiv

0+阅读 · 2023年5月30日

Deep Learning for Weakly-Supervised Object Detection and Object Localization: A Survey

Deep Learning for Weakly-Supervised Object Detection and Object Localization: A Survey

Arxiv

16+阅读 · 2021年5月26日

相关基金

基于几类订货费用的连续盘点随机库存优化问题

国家自然科学基金

0+阅读 · 2013年12月31日

随机矩阵理论中Beta系综的特征多项式

国家自然科学基金

0+阅读 · 2013年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

参数复杂性、SAT求解器和树宽度

国家自然科学基金

0+阅读 · 2012年12月31日

多通道SAR地面运动目标自动检测与定位技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

多孔介质中的Brinkman-Forchheimer方程解的稳定性研究

国家自然科学基金

0+阅读 · 2011年12月31日

集成式E/E架构下汽车功能安全建模与多目标优化分析方法

国家自然科学基金

0+阅读 · 2011年12月31日

风/光互补复合能源发电场能量管理策略研究

国家自然科学基金

0+阅读 · 2011年12月31日

区间不确定需求下的混合公路交通网络设计

国家自然科学基金

0+阅读 · 2011年12月31日

Unscented卡尔曼滤波算法及其在通信中的应用

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员