Symbolic equations are at the core of scientific discovery. The task of discovering the underlying equation from a set of input-output pairs is called symbolic regression. Traditionally, symbolic regression methods use hand-designed strategies that do not improve with experience. In this paper, we introduce the first symbolic regression method that leverages large scale pre-training. We procedurally generate an unbounded set of equations, and simultaneously pre-train a Transformer to predict the symbolic equation from a corresponding set of input-output-pairs. At test time, we query the model on a new set of points and use its output to guide the search for the equation. We show empirically that this approach can re-discover a set of well-known physical equations, and that it improves over time with more data and compute.
翻译:符号方程式是科学发现的核心。 从一组投入-产出对子中发现基本方程式的任务被称为“象征性回归 ” 。 传统上, 象征性回归方法使用手工设计的战略,但不会随着经验的改善而得到改善。 在本文中,我们引入了第一个利用大规模培训前使用的象征性回归方法。 我们从程序上生成了一套无限制的方程式,同时从一组相应的投入-产出-匹配中预演一个变异方程式以预测象征性方程式。 在测试时,我们用一组新的点来查询模型,并用其输出来指导方程式的搜索。 我们从经验上显示,这种方法可以重新发现一套众所周知的物理方程式,并且随着时间的流逝,通过更多的数据和计算来改进它。