We find that improvements in speedrunning world records follow a power law pattern. Using this observation, we answer an outstanding question from previous work: How do we improve on the baseline of predicting no improvement when forecasting speedrunning world records out to some time horizon, such as one month? Using a random effects model, we improve on this baseline for relative mean square error made on predicting out-of-sample world record improvements as the comparison metric at a $p < 10^{-5}$ significance level. The same set-up improves \textit{even} on the ex-post best exponential moving average forecasts at a $p = 0.15$ significance level while having access to substantially fewer data points. We demonstrate the effectiveness of this approach by applying it to Machine Learning benchmarks and achieving forecasts that exceed a baseline. Finally, we interpret the resulting model to suggest that 1) ML benchmarks are far from saturation and 2) sudden large improvements in Machine Learning are unlikely but cannot be ruled out.
翻译:我们发现改进速通世界纪录的趋势遵循幂律。基于这一观察结果,我们回答了以前工作中的一个未解决的问题:在将速通世界纪录预测到一定时间范围内(例如一个月)时,如何改进预测没有改进建议的基准线。使用随机效应模型,我们在相对均方误差度量指标下改进了这个基线预测,并在 $p < 10^{-5}$ 的显著性水平上实现了预测。该设置甚至在使用远少于以前的数据点时,也优于事后最佳指数移动平均预测,并在 $p = 0.15$ 的显著性水平上通过了检验。我们通过将该方法应用于机器学习基准测试中,验证了其有效性并超过了基线预测。最后,我们解释了得出的模型表明的两个结论:1)机器学习基准测试远未饱和;2)机器学习的突然大幅改进不太可能,但也不能排除。