The Elo rating system has been used world wide for individual sports and team sports, as exemplified by the European Go Federation (EGF), International Chess Federation (FIDE), International Federation of Association Football (FIFA), and many others. To evaluate the performance of artificial intelligence agents, it is natural to evaluate them on the same Elo scale as humans, such as the rating of 5185 attributed to AlphaGo Zero. There are several fundamental differences between humans and AI that suggest modifications to the system, which in turn require revisiting Elo's fundamental rationale. AI is typically trained on many more games than humans play, and we have little a-priori information on newly created AI agents. Further, AI is being extended into games which are asymmetric between the players, and which could even have large complex boards with different setup in every game, such as commercial paper strategy games. We present a revised rating system, and guidelines for tournaments, to reflect these differences.
翻译:Elo评级制度被广泛用于个人体育和团队体育,欧洲戈联(EGF)、国际象棋联合会(FIDE)、国际足球联合会(FIFA)和其他许多组织就是一个例子。为了评估人工智能人员的表现,自然要以与人类相同的埃洛规模来评价他们,如阿尔法戈零号的5185评级。 人类和AI之间存在一些根本性差异,建议对该系统进行修改,这反过来需要重新审视Elo的基本原理。 AI通常接受比人类游戏更多的游戏培训,我们对新创建的AI代理商几乎没有什么优先信息。 此外,AI正在扩展至玩家之间不对称的游戏,甚至可以在每种游戏(如商业纸质策略游戏)上设置不同的大型复杂板块。我们提出了一个订正的评级制度和比赛准则,以反映这些差异。