We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based model pre-trained to predict the next token in a document. The post-training alignment process results in improved performance on measures of factuality and adherence to desired behavior. A core component of this project was developing infrastructure and optimization methods that behave predictably across a wide range of scales. This allowed us to accurately predict some aspects of GPT-4's performance based on models trained with no more than 1/1,000th the compute of GPT-4.
翻译:我们报告GPT-4的开发情况,这是一个大型的多式联运模型,可以接受图像和文字投入,并产生文本产出。虽然在许多现实世界情景中,GPT-4在各种专业和学术基准上比人类能力差,但在各种专业和学术基准上展示了人的业绩,包括通过了模拟律师考试,其分数在考试接受者中最高10%左右。GPT-4是一个基于变压器的模型,经过预先培训,可以在文件中预测下一个标志。培训后调整过程使事实质量和遵守预期行为的措施得到更好的表现。该项目的核心组成部分是发展基础设施和优化方法,在广泛的范围内可以预测。这使我们能够准确地预测GPT-4业绩的某些方面,这些模型所培训的模型不超过GPT-4的1/1 000。</s>