Classical machine learning frameworks assume access to a possibly large dataset in order to train a predictive model. In many practical applications however, data does not arrive all at once, but in batches over time. This creates a natural trade-off between accuracy of a model and time to obtain such a model. A greedy predictor could produce non-trivial predictions by immediately training on batches as soon as these become available but, it may also make sub-optimal use of future data. On the other hand, a tardy predictor could wait for a long time to aggregate several batches into a larger dataset, but ultimately deliver a much better performance. In this work, we consider such a streaming learning setting, which we dub {\em anytime learning at macroscale} (ALMA). It is an instance of anytime learning applied not at the level of a single chunk of data, but at the level of the entire sequence of large batches. We first formalize this learning setting, we then introduce metrics to assess how well learners perform on the given task for a given memory and compute budget, and finally we test several baseline approaches on standard benchmarks repurposed for anytime learning at macroscale. The general finding is that bigger models always generalize better. In particular, it is important to grow model capacity over time if the initial model is relatively small. Moreover, updating the model at an intermediate rate strikes the best trade off between accuracy and time to obtain a useful predictor.
翻译:经典机器学习框架假定可以使用一个可能大型的数据集,以训练一个预测模型。 然而,在许多实际应用中,数据并不是一次性到达的,而是逐批到达的。 这在模型的准确性和获得模型的时间之间造成了自然的权衡。 贪婪的预测器可以产生非三重的预测, 其方法是, 一旦获得批量培训, 就可以立即进行非三重的预测, 但是, 它也可以对未来数据进行亚于最佳的利用。 另一方面, 迟缓的预测器可以等待很长的时间, 将若干批量汇总成一个更大的数据集, 但最终能产生更好的业绩。 在这项工作中, 我们考虑这样一个流式学习设置, 即模型的准确性和时间在宏观规模( ALMA ) 中, 我们随时可以使用非单一组数据的培训, 而是整个批量的级别。 我们首先将这一学习模式正规化, 然后我们引入一些衡量尺度, 来评估学生在给定的记忆和配置预算的任务上的表现如何, 最终我们测试一些基础学习环境的设置方法, 并且最终测试一些基础方法, 在标准基准上, 相对来说, 逐渐重新定位, 重新定位, 一个总的模型将比重标定, 。