Artificial intelligence (AI) researchers have been developing and refining large language models (LLMs) that exhibit remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. The latest model developed by OpenAI, GPT-4, was trained using an unprecedented scale of compute and data. In this paper, we report on our investigation of an early version of GPT-4, when it was still in active development by OpenAI. We contend that (this early version of) GPT-4 is part of a new cohort of LLMs (along with ChatGPT and Google's PaLM for example) that exhibit more general intelligence than previous AI models. We discuss the rising capabilities and implications of these models. We demonstrate that, beyond its mastery of language, GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any special prompting. Moreover, in all of these tasks, GPT-4's performance is strikingly close to human-level performance, and often vastly surpasses prior models such as ChatGPT. Given the breadth and depth of GPT-4's capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system. In our exploration of GPT-4, we put special emphasis on discovering its limitations, and we discuss the challenges ahead for advancing towards deeper and more comprehensive versions of AGI, including the possible need for pursuing a new paradigm that moves beyond next-word prediction. We conclude with reflections on societal influences of the recent technological leap and future research directions.
翻译:人工智能研究人员一直在发展和改进大型语言模型(LLMs),这些模型在各种领域和任务中表现出了 remarkable abilities,挑战了我们对学习和认知的理解。OpenAI 最新开发的模型 GPT-4,使用了前所未有的计算与数据规模进行训练。本文报告了我们研究 OpenAI 的 GPT-4 早期版本时的调查结果。我们认为,GPT-4的早期版本是新一代的 LLMs(例如 ChatGPT 和 Google的PaLM),这些模型展现出比以前的人工智能模型更智能的能力。我们讨论了这些模型的不断提升的能力和影响。我们证明在其精通语言之外,GPT-4 可以解决涉及数学、编码、视觉、医学、法律、心理学等各种新颖和困难的任务,而不需要任何特殊提示。此外,在所有这些任务中,GPT-4的表现与人类水平的表现惊人的接近,往往远远超过 ChatGPT这样的之前的模型。考虑到 GPT-4 的广度和深度,我们认为它可以合理地被视为人工通用智能系统的一个早期(但仍不完整)版本。在GPT-4的探索中,我们特别强调了发现其限制的重要性,并讨论了迈向更深入、更全面的 AGI 版本所面临的挑战,包括可能需要追求超越下一个单词预测的新范式。我们以对最近技术飞跃的社会影响和未来研究方向的反思作为结论。