倾斜潜水:探索在强化学习环境中的奖励表层 (Cliff Diving: Exploring Reward Surfaces in Reinforcement Learning Environments)

Visualizing optimization landscapes has led to many fundamental insights in numeric optimization, and novel improvements to optimization techniques. However, visualizations of the objective that reinforcement learning optimizes (the "reward surface") have only ever been generated for a small number of narrow contexts. This work presents reward surfaces and related visualizations of 27 of the most widely used reinforcement learning environments in Gym for the first time. We also explore reward surfaces in the policy gradient direction and show for the first time that many popular reinforcement learning environments have frequent "cliffs" (sudden large drops in expected return). We demonstrate that A2C often "dives off" these cliffs into low reward regions of the parameter space while PPO avoids them, confirming a popular intuition for PPO's improved performance over previous methods. We additionally introduce a highly extensible library that allows researchers to easily generate these visualizations in the future. Our findings provide new intuition to explain the successes and failures of modern RL methods, and our visualizations concretely characterize several failure modes of reinforcement learning agents in novel ways.

翻译：可视化优化景观在数字优化和优化技术的新改进方面产生了许多根本的洞察力。然而,强化学习优化(“回报表面”)的目标的视觉化仅仅在少数狭窄环境中产生过。这项工作首次展示了Gym27个最广泛使用的强化学习环境的奖励表层和相关视觉化。我们还探索了政策梯度方向的奖励表层,并首次显示许多广受欢迎的强化学习环境经常出现“裂痕”(在预期回报中出现大量下降 ) 。我们展示了A2C 常常“跳出”这些悬崖进入参数空间的低奖励区,而PPPO则避免了这些悬崖,从而证实了PPO比以往方法更好的表现的流行直觉。我们还引入了一个高度普及的图书馆,使研究人员能够方便地在未来生成这些视觉化。我们的研究结果提供了新的直觉来解释现代RL方法的成功和失败,而我们的视觉化则以新方式具体地说明了加强学习代理人的失败模式。

相关内容

Microsoft Surface

关注 5

Surface 是微软公司（ Microsoft）旗下一系列使用 Windows 10（早期为 Windows 8.X）操作系统的电脑产品，目前有 Surface、Surface Pro 和 Surface Book 三个系列。 2012 年 6 月 18 日，初代 Surface Pro/RT 由时任微软 CEO 史蒂夫·鲍尔默发布于在洛杉矶举行的记者会，2012 年 10 月 26 日上市销售。

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日