Understanding and creating mathematics using natural mathematical language - the mixture of symbolic and natural language used by humans - is a challenging and important problem for driving progress in machine learning. As a step in this direction, we develop NaturalProofs, a large-scale dataset of mathematical statements and their proofs, written in natural mathematical language. Using NaturalProofs, we propose a mathematical reference retrieval task that tests a system's ability to determine the key results that appear in a proof. Large-scale sequence models excel at this task compared to classical information retrieval techniques, and benefit from language pretraining, yet their performance leaves substantial room for improvement. NaturalProofs opens many possibilities for future research on challenging mathematical tasks.
翻译:使用自然数学语言 — — 人类使用的符号语言和自然语言的混合语言 — — 理解和创造数学是推动机器学习进步的一个棘手而重要的问题。 作为朝这个方向迈出的一步,我们开发了“自然产物 ”, 这是用自然数学语言撰写的数学语句及其证明的大规模数据集。我们用“自然产物”, 提出了一个数学参考检索任务, 测试系统确定证据中出现的关键结果的能力。 大型序列模型与古典信息检索技术相比,在这项任务中非常出色,并受益于语言预修技术,但其性能也有很大的改进空间。 “自然产物”为未来对挑战性数学任务的研究开辟了许多可能性。