Leela零分:一项基于分数的AlphaGo零分研究 (Leela Zero Score: a Study of a Score-based AlphaGo Zero)

AlphaGo, AlphaGo Zero, and all of their derivatives can play with superhuman strength because they are able to predict the win-lose outcome with great accuracy. However, Go as a game is decided by a final score difference, and in final positions AlphaGo plays suboptimal moves: this is not surprising, since AlphaGo is completely unaware of the final score difference, all winning final positions being equivalent from the winrate perspective. This can be an issue, for instance when trying to learn the "best" move or to play with an initial handicap. Moreover, there is the theoretical quest of the "perfect game", that is, the minimax solution. Thus, a natural question arises: is it possible to train a successful Reinforcement Learning agent to predict score differences instead of winrates? No empirical or theoretical evidence can be found in the literature to support the folklore statement that "this does not work". In this paper we present Leela Zero Score, a software designed to support or disprove the "does not work" statement. Leela Zero Score is designed on the open-source solution known as Leela Zero, and is trained on a 9x9 board to predict score differences instead of winrates. We find that the training produces a rational player, and we analyze its style against a strong amateur human player, to find that it is prone to some mistakes when the outcome is close. We compare its strength against SAI, an AlphaGo Zero-like software working on the 9x9 board, and find that the training of Leela Zero Score has reached a premature convergence to a player weaker than SAI.

翻译：阿尔法戈、阿尔法戈、阿尔法戈零及其所有衍生物都可以以超人的力量来发挥超人的力量, 因为他们能够非常准确地预测“ 赢赢赢” 的结果。然而, 以游戏的方式去, 由最后的分数差异来决定, 在最后的位置上, AlfaGo 播放最不完美的动作: 这并不令人惊讶, 因为阿尔法戈完全不知道最后的分数差异, 所有最后的得分都从赢的胜率都与赢率的观点相等。这可能会是一个问题, 比如当他们试图学习“ 最佳” 的动作或以初始障碍来玩耍的时候。此外, “ 完美游戏” (即迷你马克思解决方案) 的理论追求。因此, 一个自然的问题出现: 能否训练一个成功的加强学习工具来预测差额而不是赢率? 因为阿尔法戈德完全不知道最后的得分数, 所有最后的得分都与赢率相等。在本文中, Leela Zero 旨在支持或破坏“ 不起作用的” 的常态声明。 Leela Zero 评分被设计成一个“ ” 。在公开的解决方案上被设计成为“ Leela Zex 比较一个“ ” 一种“ Leela” 的比, 我们发现一个“ 赢率的得一个“ ” 和“ 赢率” 。

相关内容

AlphaGo

关注 21

AlphaGo 是一款人工智能围棋程序，由被 Google 收购的 DeepMind 公司开发。 2015年10月，AlphaGo v13 在与职业棋手、欧洲冠军樊麾二段的五番棋比赛中，以 5:0 获胜。2016年3月9日 - 15日，AlphaGo v18 在与韩国职业棋手李世石九段的五番棋比赛中，以 4:1 获胜，赛后，AlphaGo 荣获韩国棋院授予的「第〇〇一号名誉九段」证书。2016年7月19日，AlphaGo 在 GoRantings 世界围棋排名中超过柯洁，成为世界第一。

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日