We present a smoothly broken power law functional form that accurately models and extrapolates the scaling behaviors of deep neural networks (i.e. how the evaluation metric of interest varies as the amount of compute used for training, number of model parameters, training dataset size, or upstream performance varies) for various architectures and for each of various tasks within a large and diverse set of upstream and downstream tasks, in zero-shot, prompted, and fine-tuned settings. This set includes large-scale vision, language, audio, video, diffusion generative modeling, multimodal learning, contrastive learning, AI alignment, robotics, arithmetic, unsupervised/self-supervised learning, and reinforcement learning (single agent and multi-agent). When compared to other functional forms for neural scaling behavior, this functional form yields extrapolations of scaling behavior that are considerably more accurate on this set. Moreover, this functional form accurately models and extrapolates scaling behavior that other functional forms are incapable of expressing such as the non-monotonic transitions present in the scaling behavior of phenomena such as double descent and the delayed, sharp inflection points present in the scaling behavior of tasks such as arithmetic. Lastly, we use this functional form to glean insights about the limit of the predictability of scaling behavior. Code is available at https://github.com/ethancaballero/broken_neural_scaling_laws
翻译:我们展示了一种顺利破碎的权力法功能形式,它精确地模型和外推了深神经网络的缩放行为(即,由于用于培训的计算数量、模型参数数量、培训数据集大小或上游绩效的不同,评估利息的衡量尺度也因培训的计算数量、模型参数数量、培训数据集大小或上游绩效的不同而不同),对于各种结构以及大型和多样化的上游和下游任务中的每一项任务,以零射、促动和微调的设置的形式,呈现出一种顺利的分级行为。这套功能形式包括大规模视觉、语言、音频、视频、传播基因建模、多式联运学习、对比学习、AI调整、机器人、算术、不受监督/自我监督的学习以及强化学习(单一剂和多剂)的量度衡量标准。 与神经缩放行为的其他功能形式相比,这种功能形式产生了比尺度行为外推法,而在这个设置上则相当准确得多。 此外,这种功能形式模型和外推法行为,即其他功能形式无法表达诸如双位下降和延迟、直视-直视-直视-直视-直观等现象等现象。