Progress in fields of machine learning and adversarial planning has benefited significantly from benchmark domains, from checkers and the classic UCI data sets to Go and Diplomacy. In sequential decision-making, agent evaluation has largely been restricted to few interactions against experts, with the aim to reach some desired level of performance (e.g. beating a human professional player). We propose a benchmark for multiagent learning based on repeated play of the simple game Rock, Paper, Scissors along with a population of forty-three tournament entries, some of which are intentionally sub-optimal. We describe metrics to measure the quality of agents based both on average returns and exploitability. We then show that several RL, online learning, and language model approaches can learn good counter-strategies and generalize well, but ultimately lose to the top-performing bots, creating an opportunity for research in multiagent learning.
翻译:机器学习和对抗性规划领域的进展大大受益于基准领域,从格斗和经典的UCI数据集到Go和外交,在连续决策中,代理评价主要限于很少与专家进行互动,目的是达到某种理想的业绩水平(例如打人的专业球员)。我们提出了一个多试剂学习基准,其基础是反复玩简单的游戏罗克、纸张、剪刀和43个参赛者,其中一些参赛者是有意的次最佳。我们描述了衡量代理人质量的标准,既以平均回报为基础,又以可开发性为依据。我们然后表明,若干RL、在线学习和语言模式方法可以学习良好的反战略,并全面推广,但最终会输给业绩最佳的机器人,从而创造多试学习研究的机会。</s>