Widely observed neural scaling laws, in which error falls off as a power of the training set size, model size, or both, have driven substantial performance improvements in deep learning. However, these improvements through scaling alone require considerable costs in compute and energy. Here we focus on the scaling of error with dataset size and show how in theory we can break beyond power law scaling and potentially even reduce it to exponential scaling instead if we have access to a high-quality data pruning metric that ranks the order in which training examples should be discarded to achieve any pruned dataset size. We then test this improved scaling prediction with pruned dataset size empirically, and indeed observe better than power law scaling in practice on ResNets trained on CIFAR-10, SVHN, and ImageNet. Next, given the importance of finding high-quality pruning metrics, we perform the first large-scale benchmarking study of ten different data pruning metrics on ImageNet. We find most existing high performing metrics scale poorly to ImageNet, while the best are computationally intensive and require labels for every image. We therefore developed a new simple, cheap and scalable self-supervised pruning metric that demonstrates comparable performance to the best supervised metrics. Overall, our work suggests that the discovery of good data-pruning metrics may provide a viable path forward to substantially improved neural scaling laws, thereby reducing the resource costs of modern deep learning.
翻译:广泛观察的神经缩放法,其中错误随着培训数据集大小、模型大小或两者的能量而减少,从而推动深层学习的大幅绩效改进。然而,这些改进单靠扩大规模需要计算和能量方面的大量成本。在这里,我们侧重于利用数据集大小的错误缩放,并展示在理论上我们如何超越权力法的缩放,甚至有可能将其降为指数缩放,如果我们能够获得一个高质量的数据缩放度标尺,该标尺的排序将培训范例丢弃以达到任何经调整的数据集大小的顺序。然后,我们用精细的数据集大小来测试这一改进的缩放预测,并且确实比在CIFAR-10、SVHN和图像网络培训的ResNet实践中的权力法缩放要好。接下来,鉴于找到高质量的裁剪裁量度度度标尺的重要性,我们首次对图像网上10种不同的数据裁剪裁量度度度标度进行大规模基准研究。我们发现大多数现有的高性计量尺度都比不上图像网,而最好的是计算密集的,需要为每张图像的标签。因此我们制定了一个新的简单、廉价和可计量的升级的升级的升级的升级的衡量标准。因此,我们可以展示的升级的升级的升级的升级的升级的升级的升级的升级的升级的升级的升级的升级的升级的升级的升级的升级的升级的升级的升级的成绩,从而显示的升级的升级的升级的升级的升级的升级的成绩。