Reinforcement learning is one of the most popular approach for automated game playing. This method allows an agent to estimate the expected utility of its state in order to make optimal actions in an unknown environment. We seek to apply reinforcement learning algorithms to the game Flappy Bird. We implement SARSA and Q-Learning with some modifications such as $\epsilon$-greedy policy, discretization and backward updates. We find that SARSA and Q-Learning outperform the baseline, regularly achieving scores of 1400+, with the highest in-game score of 2069.
翻译:强化学习是最受欢迎的自动游戏游戏方法之一。 这种方法使代理商能够估计其状态的预期效用, 以便在未知环境中采取最佳行动。 我们试图对游戏飞禽应用强化学习算法。 我们实施SASA和Q学习, 并做了一些修改, 如$\ epsilon$- greedy 政策、 离散和后退更新。 我们发现SASA和Q- 学习比基准要好, 经常达到1400+的分数, 最高得分为2069 。