为2048年的游戏进行强化学习 (On Reinforcement Learning for the Game of 2048)

2048 is a single-player stochastic puzzle game. This intriguing and addictive game has been popular worldwide and has attracted researchers to develop game-playing programs. Due to its simplicity and complexity, 2048 has become an interesting and challenging platform for evaluating the effectiveness of machine learning methods. This dissertation conducts comprehensive research on reinforcement learning and computer game algorithms for 2048. First, this dissertation proposes optimistic temporal difference learning, which significantly improves the quality of learning by employing optimistic initialization to encourage exploration for 2048. Furthermore, based on this approach, a state-of-the-art program for 2048 is developed, which achieves the highest performance among all learning-based programs, namely an average score of 625377 points and a rate of 72% for reaching 32768-tiles. Second, this dissertation investigates several techniques related to 2048, including the n-tuple network ensemble learning, Monte Carlo tree search, and deep reinforcement learning. These techniques are promising for further improving the performance of the current state-of-the-art program. Finally, this dissertation discusses pedagogical applications related to 2048 by proposing course designs and summarizing the teaching experience. The proposed course designs adopt 2048-like games as materials for beginners to learn reinforcement learning and computer game algorithms. The courses have been successfully applied to graduate-level students and received well by student feedback.

翻译：2048 是一个单一玩家的随机拼图游戏。这个吸引人和上瘾的游戏在全世界受到欢迎,吸引了研究人员来开发游戏游戏程序。由于其简单和复杂, 2048年已经成为评估机器学习方法有效性的有趣和富有挑战性的平台。这个论文对2048年的强化学习和计算机游戏算法进行了全面研究。首先, 这个论文提出了乐观的时间差异学习, 通过使用乐观的初始化来鼓励2048年的探索, 大大提高学习质量。此外, 根据这个方法, 开发了一个2048年的最新研究生游戏程序, 在所有以学习为基础的程序中取得最高业绩, 即平均得分为6253777分, 达到32768分的比率为72%。其次, 这个论文调查了与2048年有关的几种技术, 包括N- Temble 网络的学习, Monte Carlo 树搜索, 和深加固学习。这些技术很有希望进一步改进当前应用状态的研究生课程程序的业绩。最后, 这个计算机游戏将开始学习学习20年的学习课程设计, 以学习20年学习的教学方法为学习课程。通过20年学习课程和深层次。学习的教学设计, 通过建议, 学习的教学设计开始。学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习20 学习学习学习学习学习20 学习学习学习学习20 学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习学习 20 20 20 20 20 20 20 20 20 20 20 20 学习学习学习 20 20 20 20 20 20 20 20 20 20 20 20 20 20