The largest experiments in machine learning now require resources far beyond the budget of all but a few institutions. Fortunately, it has recently been shown that the results of these huge experiments can often be extrapolated from the results of a sequence of far smaller, cheaper experiments. In this work, we show that not only can the extrapolation be done based on the size of the model, but on the size of the problem as well. By conducting a sequence of experiments using AlphaZero and Hex, we show that the performance achievable with a fixed amount of compute degrades predictably as the game gets larger and harder. Along with our main result, we further show that increasing the test-time compute available to an agent can substitute for reduced train-time compute, and vice versa.
翻译:机器学习方面最大的实验现在所需要的资源远远超出所有机构的预算。幸运的是,最近已经证明,这些巨大的实验的结果往往可以从一系列更小、更便宜的实验的结果中推断出来。在这项工作中,我们不仅能够根据模型的大小进行外推,而且能够根据问题的规模进行外推。通过使用阿尔法泽罗和赫克斯进行一系列实验,我们表明,随着游戏的扩大和增加,固定的计算量可以预测到的下降。除了我们的主要结果外,我们进一步表明,增加一个代理商的测试计算时间可以替代减少的火车计算时间,反之亦然。