Reinforcement learning is one of the most popular approaches for automated game playing. This method allows an agent to estimate the expected utility of its state in order to make optimal actions in an unknown environment. We seek to apply reinforcement learning algorithms to the game Flappy Bird. We implement SARSA and Q-Learning with some modifications such as $\epsilon$-greedy policy, discretization and backward updates. We find that SARSA and Q-Learning outperform the baseline, regularly achieving scores of 1400+, with the highest in-game score of 2069.
翻译:强化学习是最受欢迎的自动游戏游戏方法之一。 这种方法使代理商能够估计其状态的预期效用, 以便在未知环境中采取最佳行动。 我们试图将强化学习算法应用到游戏 Flappy Bird 。 我们实施SASA 和 Q- Learning, 进行一些修改, 如 $\ epsilon$- greedy 政策、 离散和后退更新 。 我们发现SASA 和 Q- Lecear 都超过了基准, 经常达到 1400 + 的分数, 最高在赛中得分为 2069 。