Many intellectual endeavors require mathematical problem solving, but this skill remains beyond the capabilities of computers. To measure this ability in machine learning models, we introduce MATH, a new dataset of 12,500 challenging competition mathematics problems. Each problem in MATH has a full step-by-step solution which can be used to teach models to generate answer derivations and explanations. To facilitate future research and increase accuracy on MATH, we also contribute a large auxiliary pretraining dataset which helps teach models the fundamentals of mathematics. Even though we are able to increase accuracy on MATH, our results show that accuracy remains relatively low, even with enormous Transformer models. Moreover, we find that simply increasing budgets and model parameter counts will be impractical for achieving strong mathematical reasoning if scaling trends continue. While scaling Transformers is automatically solving most other text-based tasks, scaling is not currently solving MATH. To have more traction on mathematical problem solving we will likely need new algorithmic advancements from the broader research community.
翻译:许多知识努力都需要数学解决问题,但这种技能仍然超出了计算机的能力。为了测量机器学习模型中的这种能力,我们引入了MATH,这是一个12,500个挑战性数学问题的新数据集。MATH的每个问题都有一个完整的渐进式解决方案,可用于教授模型以生成答案和解释。为了便利未来的研究和提高MATH的精确度,我们还贡献了一个大型的辅助预培训数据集,帮助教授数学基础模型。尽管我们能够提高MATH的精确度,但我们的结果表明,即使巨型变异模型的精确度仍然相对较低。此外,我们发现,只要只是增加预算和模型参数数数数数数,如果趋势继续缩放的话,将不切实际。虽然缩放变换器自动解决了大多数基于文本的任务,但目前并没有解决MATH。要对解决数学问题有更大的牵引力,我们可能需要从更广泛的研究界获得新的算法进步。