From the very dawn of the field, search with value functions was a fundamental concept of computer games research. Turing's chess algorithm from 1950 was able to think two moves ahead, and Shannon's work on chess from $1950$ includes an extensive section on evaluation functions to be used within a search. Samuel's checkers program from 1959 already combines search and value functions that are learned through self-play and bootstrapping. TD-Gammon improves upon those ideas and uses neural networks to learn those complex value functions -- only to be again used within search. The combination of decision-time search and value functions has been present in the remarkable milestones where computers bested their human counterparts in long standing challenging games -- DeepBlue for Chess and AlphaGo for Go. Until recently, this powerful framework of search aided with (learned) value functions has been limited to perfect information games. As many interesting problems do not provide the agent perfect information of the environment, this was an unfortunate limitation. This thesis introduces the reader to sound search for imperfect information games.
翻译:从实地的一开始,以价值值函数的搜索就是计算机游戏研究的基本概念。图灵1950年的象棋算法能够思考两步,香农从1950美元开始的象棋工作包括一个关于在搜索中使用的评估功能的广泛章节。塞缪尔1959年的棋盘程序已经将通过自我游戏和靴式学习的搜索和价值功能结合起来。TD-Gammon改进了这些想法,并使用神经网络来学习这些复杂的价值功能 -- -- 只是在搜索中再次使用。决定时间搜索和价值功能的结合已经存在于一些显著的里程碑中,在这些里程碑中,计算机在长期具有挑战性的游戏 -- -- Ches的深蓝和Go的阿尔法 Go for Go -- -- 中最优化了对等人类的对等人员。直到最近,这个具有(学习)价值功能的强大搜索框架一直局限于完善的信息游戏。由于许多有趣的问题不能为代理者提供环境的完美信息,因此这是一个不幸的限制。这个理论使读者们能够听到不完善的信息游戏。