机器学习模型大小和参数差距 (Machine Learning Model Sizes and the Parameter Gap)

We study trends in model size of notable machine learning systems over time using a curated dataset. From 1950 to 2018, model size in language models increased steadily by seven orders of magnitude. The trend then accelerated, with model size increasing by another five orders of magnitude in just 4 years from 2018 to 2022. Vision models grew at a more constant pace, totaling 7 orders of magnitude of growth between 1950 and 2022. We also identify that, since 2020, there have been many language models below 20B parameters, many models above 70B parameters, but a scarcity of models in the 20-70B parameter range. We refer to that scarcity as the parameter gap. We provide some stylized facts about the parameter gap and propose a few hypotheses to explain it. The explanations we favor are: (a) increasing model size beyond 20B parameters requires adopting different parallelism techniques, which makes mid-sized models less cost-effective, (b) GPT-3 was one order of magnitude larger than previous language models, and researchers afterwards primarily experimented with bigger models to outperform it. While these dynamics likely exist, and we believe they play some role in generating the gap, we don't have high confidence that there are no other, more important dynamics at play.

翻译：我们利用一个成熟的数据集,对显著的机器学习系统的模型规模进行长期研究。从1950年至2018年,语言模型的模型规模稳定地增加了7个数量级的大小。从2018年至2022年仅4年,这个趋势就加速了,模型规模在2018年至2022年仅4年中又增加了5个数量级;设想模型以更稳定的速度增长,1950年至2022年总共增长7个数量级的7级增长幅度。我们还发现,自2020年以来,有许多语言模型低于20B参数的模型,许多模型超过70B参数,但20-70B参数范围的模型很少,但20-70B参数范围的模型却比70B参数范围却比20-70B参数范围少。我们把这种稀缺称为参数差作为参数差距的参数差距。我们提供了一些关于参数差距的简单化事实,并提出了一些解释。我们赞成的解释是:(a) 将20B参数的模型规模扩大超过20B参数的参数在仅仅4年到2022年的参数中增加了,这需要采用不同的平行技术,使中中小型模型的模型的成本效益降低;(b) 使中模型比以前的语文模型比以前的模型比以前的模型比以前的模型大一个大一个规模大一个,而GPPPT3是比以前的模型大一个级级级的一个一个级级级级级级的级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级,而研究人员后来主要实验主要实验用一个级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级级比比比比比比比一个级级级级级级级级比它更大,但后来试验最优,但后来实验者主要试验的模型比它优的模型比它优。研究者主要试验的模型比它优。研究者首先试验,这些动态可能没有其他的试验,但其他的研究人员可能没有其他的试验,但没有其他的试验者没有其他的试验者没有其他的试验。这些动态可能存在,我们没有其他的,我们没有其他的信念,我们没有其他的试验没有其他的试验。我们没有其他的信念,我们没有其他的试验,我们没有其他的试验没有其他的

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/