重新审视增强学习在线生成无尽游戏关卡的状态空间闭合问题 (State Space Closure: Revisiting Endless Online Level Generation via Reinforcement Learning) - 专知论文

会员服务 ·

0

状态空间 · 多样性 · 有限时间 · 增强学习 · 在线 ·

2023 年 3 月 24 日

State Space Closure: Revisiting Endless Online Level Generation via Reinforcement Learning

翻译：重新审视增强学习在线生成无尽游戏关卡的状态空间闭合问题

Ziqi Wang,Tianye Shu,Jialin Liu

from arxiv, Accepted by the IEEE Transactions on Games

In this paper, we revisit endless online level generation with the recently proposed experience-driven procedural content generation via reinforcement learning (EDRL) framework. Inspired by an observation that EDRL tends to generate recurrent patterns, we formulate a notion of state space closure which makes any stochastic state appeared possibly in an infinite-horizon online generation process can be found within a finite-horizon. Through theoretical analysis, we find that even though state space closure arises a concern about diversity, it generalises EDRL trained with a finite-horizon to the infinite-horizon scenario without deterioration of content quality. Moreover, we verify the quality and the diversity of contents generated by EDRL via empirical studies, on the widely used Super Mario Bros. benchmark. Experimental results reveal that the diversity of levels generated by EDRL is limited due to the state space closure, whereas their quality does not deteriorate in a horizon which is longer than the one specified in the training. Concluding our outcomes and analysis, future work on endless online level generation via reinforcement learning should address the issue of diversity while assuring the occurrence of state space closure and quality.

翻译：在本文中，我们通过最近提出的基于经验驱动的强化学习程序内容生成（EDRL）框架重新审视无尽在线关卡生成问题。受到一种观察的启发，EDRL往往会生成重复的模式，因此我们在文章中提出了一种名为状态空间闭合的概念，在这种状态下，任何可能出现在无限时间线上的随机状态都可以被发现在有限时间内。通过理论分析，我们发现，即使状态空间闭合引起了对多样性的关注，EDRL仍然可以在不损失内容质量的情况下将其训练于有限时间线并推广到无限时间线的情况。此外，我们通过实证研究验证了EDRL生成的内容的质量和多样性，测试用例为广泛使用的Super Mario Bros.基准测试。实验结果显示，由于状态空间闭合，EDRL生成的关卡多样性受到了限制，而其质量不会在比训练规定的时间线更长的时间线上出现明显的退化。通过我们的结果和分析，未来有关通过强化学习进行无尽在线关卡生成的工作应该解决多样性问题，并确保状态空间闭合和内容质量的出现。

0

相关内容

状态空间

【AI+军事】美国HRL实验室AAAI2020《基于强化学习的多智能体任务规划》，Multi-Agent Mission Planning with Reinforcement Learning

【AI+军事】美国HRL实验室AAAI2020《基于强化学习的多智能体任务规划》，Multi-Agent Mission Planning with Reinforcement Learning

专知会员服务

231+阅读 · 2022年4月10日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

专知会员服务

13+阅读 · 2019年11月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

基于模糊逻辑的大规模强化学习理论及方法

国家自然科学基金

7+阅读 · 2014年12月31日

二维异质复合薄膜材料的可控制备及储锂性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

基于软测量的纺织工业生产过程鲁棒运行优化问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

无界区域最优控制问题的无限元方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

随机环境下卡尔曼滤波器动态特性

国家自然科学基金

1+阅读 · 2012年12月31日

针刺对控制性促超排卵围着床期微血管形成和血流动力学的调节机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向属性的CPN建模及On the Fly辅助的测试生成方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

非线性软测量系统递推量子随机滤波方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

Planning Multiple Epidemic Interventions with Reinforcement Learning

Planning Multiple Epidemic Interventions with Reinforcement Learning

Arxiv

0+阅读 · 2023年5月16日

Experiential Explanations for Reinforcement Learning

Arxiv

0+阅读 · 2023年5月16日

Explainable Reinforcement Learning via a Causal World Model

Arxiv

0+阅读 · 2023年5月15日

Horizon-free Reinforcement Learning in Adversarial Linear Mixture MDPs

Arxiv

0+阅读 · 2023年5月15日

On the Reuse Bias in Off-Policy Reinforcement Learning

Arxiv

0+阅读 · 2023年5月15日

S-REINFORCE: A Neuro-Symbolic Policy Gradient Approach for Interpretable Reinforcement Learning

Arxiv

0+阅读 · 2023年5月12日

Boosting Value Decomposition via Unit-Wise Attentive State Representation for Cooperative Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年5月12日

A Survey of Meta-Reinforcement Learning

Arxiv

12+阅读 · 2023年1月19日

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Arxiv

12+阅读 · 2021年2月7日

Game-Theoretic and Machine Learning-based Approaches for Defensive Deception: A Survey

Arxiv

26+阅读 · 2021年1月21日

VIP会员

文章信息

相关主题

相关VIP内容

【AI+军事】美国HRL实验室AAAI2020《基于强化学习的多智能体任务规划》，Multi-Agent Mission Planning with Reinforcement Learning

【AI+军事】美国HRL实验室AAAI2020《基于强化学习的多智能体任务规划》，Multi-Agent Mission Planning with Reinforcement Learning

专知会员服务

231+阅读 · 2022年4月10日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

专知会员服务

13+阅读 · 2019年11月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Planning Multiple Epidemic Interventions with Reinforcement Learning

Planning Multiple Epidemic Interventions with Reinforcement Learning

Arxiv

0+阅读 · 2023年5月16日

Experiential Explanations for Reinforcement Learning

Arxiv

0+阅读 · 2023年5月16日

Explainable Reinforcement Learning via a Causal World Model

Arxiv

0+阅读 · 2023年5月15日

Horizon-free Reinforcement Learning in Adversarial Linear Mixture MDPs

Arxiv

0+阅读 · 2023年5月15日

On the Reuse Bias in Off-Policy Reinforcement Learning

Arxiv

0+阅读 · 2023年5月15日

S-REINFORCE: A Neuro-Symbolic Policy Gradient Approach for Interpretable Reinforcement Learning

Arxiv

0+阅读 · 2023年5月12日

Boosting Value Decomposition via Unit-Wise Attentive State Representation for Cooperative Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年5月12日

A Survey of Meta-Reinforcement Learning

Arxiv

12+阅读 · 2023年1月19日

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Arxiv

12+阅读 · 2021年2月7日

Game-Theoretic and Machine Learning-based Approaches for Defensive Deception: A Survey

Arxiv

26+阅读 · 2021年1月21日

相关基金

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

基于模糊逻辑的大规模强化学习理论及方法

国家自然科学基金

7+阅读 · 2014年12月31日

二维异质复合薄膜材料的可控制备及储锂性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

基于软测量的纺织工业生产过程鲁棒运行优化问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

无界区域最优控制问题的无限元方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

随机环境下卡尔曼滤波器动态特性

国家自然科学基金

1+阅读 · 2012年12月31日

针刺对控制性促超排卵围着床期微血管形成和血流动力学的调节机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向属性的CPN建模及On the Fly辅助的测试生成方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

非线性软测量系统递推量子随机滤波方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员