Empirical science of neural scaling laws is a rapidly growing area of significant importance to the future of machine learning, particularly in the light of recent breakthroughs achieved by large-scale pre-trained models such as GPT-3, CLIP and DALL-e. Accurately predicting the neural network performance with increasing resources such as data, compute and model size provides a more comprehensive evaluation of different approaches across multiple scales, as opposed to traditional point-wise comparisons of fixed-size models on fixed-size benchmarks, and, most importantly, allows for focus on the best-scaling, and thus most promising in the future, approaches. In this work, we consider a challenging problem of few-shot learning in image classification, especially when the target data distribution in the few-shot phase is different from the source, training, data distribution, in a sense that it includes new image classes not encountered during training. Our current main goal is to investigate how the amount of pre-training data affects the few-shot generalization performance of standard image classifiers. Our key observations are that (1) such performance improvements are well-approximated by power laws (linear log-log plots) as the training set size increases, (2) this applies to both cases of target data coming from either the same or from a different domain (i.e., new classes) as the training data, and (3) few-shot performance on new classes converges at a faster rate than the standard classification performance on previously seen classes. Our findings shed new light on the relationship between scale and generalization.
翻译:神经缩放法的经验科学是一个迅速增长的领域,对机器学习的未来具有重要意义,特别是考虑到GPT-3、CLIP和DALL-e等大规模预先培训的大规模模型最近取得的突破。 精确预测神经网络的性能,利用诸如数据、计算和模型规模等越来越多的资源,对多种规模的不同方法进行更全面的评估,而不是对固定规模基准固定规模模型的传统点对点比较,最重要的是,能够注重最佳尺度的扩大,从而对未来最有希望的方法。在这项工作中,我们认为在图像分类方面少见的学习是一个具有挑战性的问题,特别是当几发阶段的目标数据分布与来源、培训、计算和模型规模不同时,它包括培训期间没有遇到的新图像课程。 我们目前的主要目标是调查培训前数据的数量如何影响标准图像分类的少数微小的概括性业绩。 我们的主要观察是:(1) 这样的业绩改进通过权力法的较轻等级、一般等级和新类别(线级)关系在新的水平上,从新的数据级别上,从新的数据级别上,从新的数据级别上,从新的水平上,从新的水平上,从新的水平上,从新的指标级,从新的数据级,从新的水平上,从新的数据级,从新的数据级,从新的程度,从新的分级级,从新的分级,从新的数据级,到不同的级,从新的数据级,从新的分级,从不同的级,从新的数据级,从新的分级,从新的数据级,到分级,到不同的级,从不同的级,从不同的级,从不同的级,从不同的级,从新的分级,到分级,从新的分级,到分级。