Language models have achieved remarkable performance on a wide range of tasks that require natural language understanding. Nevertheless, state-of-the-art models have generally struggled with tasks that require quantitative reasoning, such as solving mathematics, science, and engineering problems at the college level. To help close this gap, we introduce Minerva, a large language model pretrained on general natural language data and further trained on technical content. The model achieves state-of-the-art performance on technical benchmarks without the use of external tools. We also evaluate our model on over two hundred undergraduate-level problems in physics, biology, chemistry, economics, and other sciences that require quantitative reasoning, and find that the model can correctly answer nearly a third of them.
翻译:语言模型在需要自然语言理解的广泛任务中取得了显著成绩,然而,最先进的模型一般都与需要定量推理的任务挣扎不休,例如解决大学一级的数学、科学和工程问题。为了帮助缩小这一差距,我们引入了大型语言模型Minerva,这是在一般自然语言数据方面受过预先培训的大型语言模型,在技术内容方面受过进一步培训的Minirva。该模型在技术基准方面取得了最新业绩,没有使用外部工具。我们还评估了我们在物理学、生物学、化学、经济学和其他科学领域200多个本科生问题模型,需要定量推理,发现该模型可以正确回答近三分之一的问题。