Reinforcement learning (RL) is successful at learning to play games where the entire environment is visible. However, RL approaches are challenged in complex games like Starcraft II and in real-world environments where the entire environment is not visible. In these more complex games with more limited visual information, agents must choose where to look and how to optimally use their limited visual information in order to succeed at the game. We verify that with a relatively simple model the agent can learn where to look in scenarios with a limited visual bandwidth. We develop a method for masking part of the environment in Atari games to force the RL agent to learn both where to look and how to play the game in order to study where the RL agent learns to look. In addition, we develop a neural network architecture and method for allowing the agent to choose where to look and what action to take in the Pong game. Further, we analyze the strategies the agent learns to better understand how the RL agent learns to play the game.
翻译:强化学习( RL) 成功地学会了在全环境可见的地方玩游戏。 然而, RL 方法在像 Starcraft II 这样的复杂游戏中以及在全环境不可见的现实世界环境中都遇到挑战。 在这些视觉信息有限、更复杂的游戏中, 代理商必须选择在哪里寻找以及如何最佳地使用其有限的视觉信息, 才能在游戏中取得成功。 我们用一个相对简单的模型来验证该代理商可以学习如何在视觉带宽有限的场景中寻找。 我们开发了一个隐藏 Atari 游戏中环境部分的方法, 迫使 RL 代理商学习如何寻找以及如何玩游戏, 以便研究RL 代理商学会看在哪里。 此外, 我们开发一个神经网络架构和方法, 让代理商选择在哪里寻找以及在Pong 游戏中要采取什么行动。 此外, 我们分析该代理商学会的战略, 以便更好地了解RL 代理商如何学会玩游戏。