Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning. Given their computational cost, these models are difficult to replicate without significant capital. For the few that are available through APIs, no access is granted to the full model weights, making them difficult to study. We present Open Pre-trained Transformers (OPT), a suite of decoder-only pre-trained transformers ranging from 125M to 175B parameters, which we aim to fully and responsibly share with interested researchers. We show that OPT-175B is comparable to GPT-3, while requiring only 1/7th the carbon footprint to develop. We are also releasing our logbook detailing the infrastructure challenges we faced, along with code for experimenting with all of the released models.
翻译:大型语言模型通常经过数十万个计算日的培训,显示出零和零光学习的非凡能力。鉴于计算成本,这些模型很难复制,没有大量资本。对于通过API提供的少数,没有机会获得全模型重量,因此难以研究。我们介绍了一套开放的预培训变压器(OPT),这是一套只有解码器的预培训变压器,范围从125M到175B参数,我们的目标是与感兴趣的研究人员充分和负责任地分享这些参数。我们显示,Oblo-175B与GPT-3相比,只需要1/7的碳足迹即可发展。我们还公布了我们详细介绍我们面临的基础设施挑战的日志,以及所有释放模型的实验代码。