Hungry Geese is a n-player variation of the popular game snake. This paper looks at state of the art Deep Reinforcement Learning Value Methods. The goal of the paper is to aggregate research of value based methods and apply it as an exercise to other environments. A vanilla Deep Q Network, a Double Q-network and a Dueling Q-Network were all examined and tested with the Hungry Geese environment. The best performing model was the vanilla Deep Q Network due to its simple state representation and smaller network structure. Converging towards an optimal policy was found to be difficult due to random geese initialization and food generation. Therefore we show that Deep Q Networks may not be the appropriate model for such a stochastic environment and lastly we present improvements that can be made along with more suitable models for the environment.
翻译:饿鹅是流行的游戏蛇的正玩家变种。 本文审视了最先进的深强化学习价值方法。 论文的目标是汇总基于价值方法的研究并将其应用于其他环境。 一个香草深海Q网络、一个双Q网络和一个配对QNetwork都经过了饥饿鹅环境的检查和测试。 最有效果的模式是香草深Q网络, 因为它具有简单的州代表性和较小的网络结构。 由于随机的雪雁初始化和食物生成, 很难实现最佳政策。 因此, 我们表明, 深鸡网络可能不是这种随机环境的合适模式, 最后我们介绍了可以与环境更合适的模式一起作出的改进。