The Elo rating system is widely adopted to evaluate the skills of (chess) game and sports players. Recently it has been also integrated into machine learning algorithms in evaluating the performance of computerised AI agents. However, an accurate estimation of the Elo rating (for the top players) often requires many rounds of competitions, which can be expensive to carry out. In this paper, to improve the sample efficiency of the Elo evaluation (for top players), we propose an efficient online match scheduling algorithm. Specifically, we identify and match the top players through a dueling bandits framework and tailor the bandit algorithm to the gradient-based update of Elo. We show that it reduces the per-step memory and time complexity to constant, compared to the traditional likelihood maximization approaches requiring $O(t)$ time. Our algorithm has a regret guarantee of $\tilde{O}(\sqrt{T})$, sublinear in the number of competition rounds and has been extended to the multidimensional Elo ratings for handling intransitive games. We empirically demonstrate that our method achieves superior convergence speed and time efficiency on a variety of gaming tasks.
翻译:Elo评级系统被广泛采用,用于评估(ches)游戏和体育运动员的技能。最近,它也被纳入机器学习算法中,用于评估计算机化的AI代理器的性能。然而,准确估计Elo评级(对于顶级球员)往往需要多轮竞赛,进行这些竞赛的费用可能很高。在本文中,为了提高Elo评估(对于顶级球员)的抽样效率,我们建议一种高效的在线匹配排期算法。具体地说,我们通过一个决斗框架来识别和匹配顶级球员,并将土匪算法与基于梯度的Elo更新相匹配。我们表明,与传统的需要美元(t)时间的可能性最大化方法相比,它减少了每步的记忆和时间复杂性。我们的算法有一个令人遗憾的保证是$tilde{O}(sqrt{T}}$,这是竞争回合次数的子线性保证,并且已经扩大到用于多维级的 Elo等级等级,用于处理不透明的游戏。我们的经验证明,我们的方法在各种游戏上实现了更高的趋同速度和时间效率。