非马尔科瓦尼在最大宇宙探测状态中的重要性 (The Importance of Non-Markovianity in Maximum State Entropy Exploration) - 专知论文

会员服务 ·

0

Markovian · Learning · 随机性策略 · Agent · 确定性策略 ·

2022 年 7 月 8 日

The Importance of Non-Markovianity in Maximum State Entropy Exploration

翻译：非马尔科瓦尼在最大宇宙探测状态中的重要性

Mirco Mutti,Riccardo De Santi,Marcello Restelli

from arxiv, ICML 2022

In the maximum state entropy exploration framework, an agent interacts with a reward-free environment to learn a policy that maximizes the entropy of the expected state visitations it is inducing. Hazan et al. (2019) noted that the class of Markovian stochastic policies is sufficient for the maximum state entropy objective, and exploiting non-Markovianity is generally considered pointless in this setting. In this paper, we argue that non-Markovianity is instead paramount for maximum state entropy exploration in a finite-sample regime. Especially, we recast the objective to target the expected entropy of the induced state visitations in a single trial. Then, we show that the class of non-Markovian deterministic policies is sufficient for the introduced objective, while Markovian policies suffer non-zero regret in general. However, we prove that the problem of finding an optimal non-Markovian policy is NP-hard. Despite this negative result, we discuss avenues to address the problem in a tractable way and how non-Markovian exploration could benefit the sample efficiency of online reinforcement learning in future works.

翻译：在最大状态的昆虫勘探框架内,一个代理人与无报酬环境互动,学习一项政策,最大限度地增加预期国家访问的酶性。哈赞等人(2019年)指出,马尔科维亚类的随机分析政策足以达到最大程度的星盘目标,而在这一背景下,利用非马科维亚性通常被认为毫无意义。在本文中,我们争辩说,非马科维亚性对于在限定抽样制度下进行最大程度的州性昆虫勘探来说是至高无报酬的。特别是,我们在一次试验中重新设定目标,目标是针对引致的州访问的预期酶性。然后,我们表明非马尔科维亚类的确定性政策足以达到提出的目标,而马尔科维亚类政策一般而言并不感到零遗憾。然而,我们证明,找到最佳非马尔科维尼政策的问题是硬的。尽管存在这种负面结果,但我们讨论如何以可移植的方式解决这一问题,以及非马尔科维亚勘探如何使未来作品在线强化学习的抽样效率受益。

0

相关内容

Markovian

71页PDF，Intro to the Metaverse（元宇宙概念发展透析），Newzoo Trend Report 2021

71页PDF，Intro to the Metaverse（元宇宙概念发展透析），Newzoo Trend Report 2021

专知会员服务

22+阅读 · 2022年2月19日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

随机微分方程解的稳定性和矩有界性

国家自然科学基金

0+阅读 · 2015年12月31日

Heusler结构自旋无能隙半导体的表/界面和自旋输运性质的研究

国家自然科学基金

0+阅读 · 2014年12月31日

Insulicolide A的全合成和结构优化

国家自然科学基金

0+阅读 · 2014年12月31日

β-二酮稀土配合物单离子磁体

国家自然科学基金

0+阅读 · 2014年12月31日

钙钛矿铁电体-半导体硅异质结的理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

高阶Schwarz导数与Teichmuller空间紧化

国家自然科学基金

0+阅读 · 2012年12月31日

一种时空白噪声驱动的Navier-Stokes方程的隐格式

国家自然科学基金

0+阅读 · 2011年12月31日

金属氧化物在羧基功能化离子液体中的溶解

国家自然科学基金

0+阅读 · 2011年12月31日

直接铸造纳米晶稀土永磁及磁场和热变形诱导组织与磁各向异性

国家自然科学基金

0+阅读 · 2011年12月31日

SFP: State-free Priors for Exploration in Off-Policy Reinforcement Learning

Arxiv

0+阅读 · 2022年8月30日

A General Purpose Exact Solution Method for Mixed Integer Concave Minimization Problems

Arxiv

0+阅读 · 2022年8月30日

Improving the Robustness of Reinforcement Learning Policies with $\mathcal{L}_{1}$ Adaptive Control

Arxiv

0+阅读 · 2022年8月29日

Flexible control of the median of the false discovery proportion

Flexible control of the median of the false discovery proportion

Arxiv

0+阅读 · 2022年8月27日

Implementing quantum dimensionality reduction for non-Markovian stochastic simulation

Arxiv

0+阅读 · 2022年8月26日

Computing Maximum Fixed Point Solutions over Feasible Paths in Data Flow Analyses

Arxiv

0+阅读 · 2022年8月26日

Dynamic Regret of Online Markov Decision Processes

Arxiv

0+阅读 · 2022年8月26日

On the Effectiveness of Transfer Learning for Code Search

Arxiv

0+阅读 · 2022年8月25日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Arxiv

19+阅读 · 2022年5月13日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

VIP会员

文章信息

相关主题

随机性策略

确定性策略

相关VIP内容

71页PDF，Intro to the Metaverse（元宇宙概念发展透析），Newzoo Trend Report 2021

71页PDF，Intro to the Metaverse（元宇宙概念发展透析），Newzoo Trend Report 2021

专知会员服务

22+阅读 · 2022年2月19日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《战区安全决策课程体系》最新244页

《"无人机航母"原型平台》

任务规划与地形分析：现代复杂环境作战导航体系

《攻击场景描述形式化模型研究》

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

SFP: State-free Priors for Exploration in Off-Policy Reinforcement Learning

Arxiv

0+阅读 · 2022年8月30日

A General Purpose Exact Solution Method for Mixed Integer Concave Minimization Problems

Arxiv

0+阅读 · 2022年8月30日

Improving the Robustness of Reinforcement Learning Policies with $\mathcal{L}_{1}$ Adaptive Control

Arxiv

0+阅读 · 2022年8月29日

Flexible control of the median of the false discovery proportion

Flexible control of the median of the false discovery proportion

Arxiv

0+阅读 · 2022年8月27日

Implementing quantum dimensionality reduction for non-Markovian stochastic simulation

Arxiv

0+阅读 · 2022年8月26日

Computing Maximum Fixed Point Solutions over Feasible Paths in Data Flow Analyses

Arxiv

0+阅读 · 2022年8月26日

Dynamic Regret of Online Markov Decision Processes

Arxiv

0+阅读 · 2022年8月26日

On the Effectiveness of Transfer Learning for Code Search

Arxiv

0+阅读 · 2022年8月25日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Arxiv

19+阅读 · 2022年5月13日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

相关基金

随机微分方程解的稳定性和矩有界性

国家自然科学基金

0+阅读 · 2015年12月31日

Heusler结构自旋无能隙半导体的表/界面和自旋输运性质的研究

国家自然科学基金

0+阅读 · 2014年12月31日

Insulicolide A的全合成和结构优化

国家自然科学基金

0+阅读 · 2014年12月31日

β-二酮稀土配合物单离子磁体

国家自然科学基金

0+阅读 · 2014年12月31日

钙钛矿铁电体-半导体硅异质结的理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

高阶Schwarz导数与Teichmuller空间紧化

国家自然科学基金

0+阅读 · 2012年12月31日

一种时空白噪声驱动的Navier-Stokes方程的隐格式

国家自然科学基金

0+阅读 · 2011年12月31日

金属氧化物在羧基功能化离子液体中的溶解

国家自然科学基金

0+阅读 · 2011年12月31日

直接铸造纳米晶稀土永磁及磁场和热变形诱导组织与磁各向异性

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员