找到马尔科夫决策程序的安全政策区 (Finding Safe Zones of policies Markov Decision Processes) - 专知论文

会员服务 ·

0

近似 · Processing（编程语言） · 分解的 · 优化器 · 样本 ·

2022 年 2 月 23 日

Finding Safe Zones of policies Markov Decision Processes

翻译：找到马尔科夫决策程序的安全政策区

Lee Cohen,Yishay Mansour,Michal Moshkovitz

Given a policy, we define a SafeZone as a subset of states, such that most of the policy's trajectories are confined to this subset. The quality of the SafeZone is parameterized by the number of states and the escape probability, i.e., the probability that a random trajectory will leave the subset. SafeZones are especially interesting when they have a small number of states and low escape probability. We study the complexity of finding optimal SafeZones, and show that in general the problem is computationally hard. For this reason we concentrate on computing approximate SafeZones. Our main result is a bi-criteria approximation algorithm which gives a factor of almost $2$ approximation for both the escape probability and SafeZone size, using a polynomial size sample complexity. We conclude the paper with an empirical evaluation of our algorithm.

翻译：根据一项政策,我们把安全区定义为一个州子集,因此该政策的大部分轨道都局限于这一子集。安全区的质量以州数和逃逸概率为参数,即随机轨迹离开该子集的概率为参数。当安全区拥有少数州和低逃逸概率时,安全区特别有趣。我们研究了找到最佳安全区的复杂性,并表明问题一般是计算困难的。为此,我们集中计算接近安全区的近似值。我们的主要结果是一种双标准近似算法,它使用多元体积的样本复杂度,使逃逸概率和安全区大小的近似值几乎达到2美元。我们用对算法的经验评估来完成这份文件。

0

相关内容

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

凸可分半定规划的数值算法

国家自然科学基金

0+阅读 · 2013年12月31日

基于描述逻辑及符号算法的事例相似性研究

国家自然科学基金

0+阅读 · 2012年12月31日

半参数回归分析的随机函数法及其高维情形

国家自然科学基金

2+阅读 · 2012年12月31日

基于贝叶斯推理的模糊逻辑强化学习模型研究

国家自然科学基金

18+阅读 · 2012年12月31日

OWL DL公理的统计关系学习方法

国家自然科学基金

0+阅读 · 2011年12月31日

Scalable Sampling for Nonsymmetric Determinantal Point Processes

Arxiv

0+阅读 · 2022年4月19日

Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models

Arxiv

1+阅读 · 2022年4月18日

Risk and optimal policies in bandit experiments

Risk and optimal policies in bandit experiments

Arxiv

0+阅读 · 2022年4月18日

M-Estimation based on quasi-processes from discrete samples of Levy processes

Arxiv

0+阅读 · 2022年4月18日

How are Software Repositories Mined? A Systematic Literature Review of Workflows, Methodologies, Reproducibility, and Tools

Arxiv

0+阅读 · 2022年4月17日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】面向真实世界音视联合语音识别的可扩展框架

《通过仿真与开源数据提升战略决策：机遇与局限》最新报告

【AAAI2026】善始则事半功倍：基于前缀优化的大语言模型推理强化学习

评估大语言模型在科学发现中的作用

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Scalable Sampling for Nonsymmetric Determinantal Point Processes

Arxiv

0+阅读 · 2022年4月19日

Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models

Arxiv

1+阅读 · 2022年4月18日

Risk and optimal policies in bandit experiments

Risk and optimal policies in bandit experiments

Arxiv

0+阅读 · 2022年4月18日

M-Estimation based on quasi-processes from discrete samples of Levy processes

Arxiv

0+阅读 · 2022年4月18日

How are Software Repositories Mined? A Systematic Literature Review of Workflows, Methodologies, Reproducibility, and Tools

Arxiv

0+阅读 · 2022年4月17日

相关基金

凸可分半定规划的数值算法

国家自然科学基金

0+阅读 · 2013年12月31日

基于描述逻辑及符号算法的事例相似性研究

国家自然科学基金

0+阅读 · 2012年12月31日

半参数回归分析的随机函数法及其高维情形

国家自然科学基金

2+阅读 · 2012年12月31日

基于贝叶斯推理的模糊逻辑强化学习模型研究

国家自然科学基金

18+阅读 · 2012年12月31日

OWL DL公理的统计关系学习方法

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员