We demonstrate that a neural network pre-trained on text and fine-tuned on code solves Mathematics problems by program synthesis. We turn questions into programming tasks, automatically generate programs, and then execute them, perfectly solving university-level problems from MIT's large Mathematics courses (Single Variable Calculus 18.01, Multivariable Calculus 18.02, Differential Equations 18.03, Introduction to Probability and Statistics 18.05, Linear Algebra 18.06, and Mathematics for Computer Science 6.042), Columbia University's COMS3251 Computational Linear Algebra course, as well as questions from a MATH dataset (on Prealgebra, Algebra, Counting and Probability, Number Theory, and Precalculus), the latest benchmark of advanced mathematics problems specifically designed to assess mathematical reasoning. We explore prompt generation methods that enable Transformers to generate question solving programs for these subjects, including solutions with plots. We generate correct answers for a random sample of questions in each topic. We quantify the gap between the original and transformed questions and perform a survey to evaluate the quality and difficulty of generated questions. This is the first work to automatically solve, grade, and generate university-level Mathematics course questions at scale. This represents a milestone for higher education.
翻译:我们通过程序合成,将问题化为编程任务,自动生成程序,然后执行,完全解决麻省理工学院大型数学课程(Single Vice Calculus 18.01, 多变量计算18.02,多变计算法18.03, 概率和统计介绍18.05, 线性代数18.06, 计算机科学数学引言6.042)的大学数学问题。我们把问题变成编程任务,自动生成程序,然后执行,完全解决麻省理工学院大型数学课程(Single Volable Calculus 18.01, 多变数计算法18.02, 不同等量法18.03, 概率和统计引言18.05, 线性代数18.06, 计算机科学数学引论6.042)的神经网络。哥伦比亚大学COMS3251 Computational Linear Algebra课程,以及MATH数据集(关于预视镜、阿尔格布拉、计数和概率、数位数和概率、数等)中的问题,这是用来评估数学等级问题的最新基准。我们首先量化了原始和变换版的问题,然后进行一次调查, 以得出了大学等级问题。这个等级问题,这是一个等级的阶段的阶段,这是一个等级问题, 和等级分析。