利用摘要前进模型和转让学习为星空系统进行高效率强化学习 (Efficient Reinforcement Learning for StarCraft by Abstract Forward Models and Transfer Learning)

Injecting human knowledge is an effective way to accelerate reinforcement learning (RL). However, these methods are underexplored. This paper presents our discovery that an abstract forward model (Thought-game (TG)) combined with transfer learning is an effective way. We take StarCraft II as the study environment. With the help of a designed TG, the agent can learn a 99\% win-rate on a 64$\times$64 map against the Level-7 built-in AI, using only 1.08 hours in a single commercial machine. We also show that the TG method is not as restrictive as it was thought to be. It can work with roughly designed TGs, and can also be useful when the environment changes. Comparing with previous model-based RL, we show TG is more effective. We also present a TG hypothesis that gives the influence of fidelity levels of TG. For real games that have unequal state and action spaces, we proposed a novel XfrNet of which usefulness is validated while achieving a 90\% win-rate against the cheating Level-10 AI. We argue the TG method might shed light on further studies of efficient RL with human knowledge.

翻译：注入人类知识是加速强化学习的有效方法(RL) 。然而,这些方法没有得到充分探讨。本文展示了我们的发现, 抽象的前瞻性模型( TG) 与转移学习相结合是一种有效的方法。我们把StarCraft II作为学习环境。在设计TG的帮助下, 代理可以在64美元乘64美元平时的64美元平面图上学习99 ⁇ 赢率, 而AI仅使用一个商业机器中的1. 08小时。我们还表明, TG 方法没有想象的那么严格。它可以与设计大致的TG(TG)合作, 当环境变化时也可以有用。与以前基于模型的RL相比, 我们展示TG 更有效。我们还提出了一个TG 假设, 赋予TG 忠诚水平的影响。对于存在不平等状态和行动空间的真正游戏, 我们提议了一个新型的 XfrNet, 其效用得到验证, 同时实现一个90 ⁇ 双向10 AI 。我们认为TG 方法可以让人类进一步研究高效水平知识。

相关内容

关注 244

IEEE游戏汇刊(T-G)发表关于游戏的科学、技术和工程方面的高质量原创文章。本杂志的文章按照IEEE PSPB操作手册(章节8.2.1.C和8.2.2.A)的要求进行同行评审。每一篇发表的文章都由至少两名独立的审稿人通过单盲的同行评审过程进行评审，审稿人的身份作者并不知道，但审稿人知道作者的身份。文章在被接受前筛选是否抄袭。官网地址：http://dblp.uni-trier.de/db/journals/tciaig/

深度学习搜索，Exploring Deep Learning for Search

专知会员服务

61+阅读 · 2020年5月9日

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

专知会员服务

35+阅读 · 2019年12月12日