We demonstrate that a neural network pre-trained on text and fine-tuned on code solves Mathematics problems by program synthesis. We turn questions into programming tasks, automatically generate programs, and then execute them, perfectly solving university-level problems from MIT's large Mathematics courses (Single Variable Calculus 18.01, Multivariable Calculus 18.02, Differential Equations 18.03, Introduction to Probability and Statistics 18.05, Linear Algebra 18.06, and Mathematics for Computer Science 6.042) as well as questions from a MATH dataset (on Prealgebra, Algebra, Counting and Probability, Number Theory, and Precalculus), the latest benchmark of advanced mathematics problems specifically designed to assess mathematical reasoning. We explore prompt generation methods that enable Transformers to generate question solving programs for these subjects, including solutions with plots. We generate correct answers for a random sample of questions in each topic. We quantify the gap between the original and transformed questions and perform a survey to evaluate the quality and difficulty of generated questions. This is the first work to automatically solve, grade, and generate university-level Mathematics course questions at scale which represents a milestone for higher education.
翻译:我们通过程序合成,将问题转化为编程任务,自动生成程序,然后执行,完美地解决麻省理工学院大型数学课程(Single Vice Calculus 18.01, 多变量计算法18.02,多变计算法18.02,不同等分法18.03,概率入门和统计18.05,线形代数18.06,计算机科学数学6.042)中大学一级的问题,以及MATH数据集(预视、代数、计数和概率、数字理论和预估数)中的问题,这是专为评估数学推理而设计的高级数学问题的最新基准。我们探索迅速生成方法,使变换者能够生成这些科目的解答题方案,包括用地块解决问题的方法。我们为每个专题的随机抽样问题得出正确的答案。我们量化了原始问题和变换问题之间的差距,并进行了调查,以评价所产生问题的质量和难度。这是在大学一级自动解决里程碑、等级和产生问题的第一个工作。