For a given distribution, learning algorithm, and performance metric, the rate of convergence (or data-scaling law) is the asymptotic behavior of the algorithm's test performance as a function of number of train samples. Many learning methods in both theory and practice have power-law rates, i.e. performance scales as $n^{-\alpha}$ for some $\alpha > 0$. Moreover, both theoreticians and practitioners are concerned with improving the rates of their learning algorithms under settings of interest. We observe the existence of a "universal learner", which achieves the best possible distribution-dependent asymptotic rate among all learning algorithms within a specified runtime (e.g. $O(n^2)$), while incurring only polylogarithmic slowdown over this runtime. This algorithm is uniform, and does not depend on the distribution, and yet achieves best-possible rates for all distributions. The construction itself is a simple extension of Levin's universal search (Levin, 1973). And much like universal search, the universal learner is not at all practical, and is primarily of theoretical and philosophical interest.
翻译:对于特定的分布、学习算法和性能衡量标准,趋同率(或数据扩增法)是算法测试性能的无现成行为作为火车样品数的函数。理论和实践中的许多学习方法都有功率法率,即性能比值为$ ⁇ -\\\alpha美元 > 0美元。此外,理论学家和从业者都关心在感兴趣的环境下提高学习性能的速率。我们看到存在一个“普遍学习者”,在特定运行时间(例如$O(n)2美元)内所有学习性算法中实现尽可能最佳的基于分配的无现成率,而在这个运行时间里只造成多元性减速。这种算法是统一的,并不取决于分布,而是在所有分布中达到最有可能的速率。构建本身是Levin普遍搜索的简单延伸(Levin,1973年) 。而且与普遍搜索一样,普世学习者并不具有任何实用性,而且主要是理论和哲学利益。