Symbolic regression, the task of predicting the mathematical expression of a function from the observation of its values, is a difficult task which usually involves a two-step procedure: predicting the "skeleton" of the expression up to the choice of numerical constants, then fitting the constants by optimizing a non-convex loss function. The dominant approach is genetic programming, which evolves candidates by iterating this subroutine a large number of times. Neural networks have recently been tasked to predict the correct skeleton in a single try, but remain much less powerful. In this paper, we challenge this two-step procedure, and task a Transformer to directly predict the full mathematical expression, constants included. One can subsequently refine the predicted constants by feeding them to the non-convex optimizer as an informed initialization. We present ablations to show that this end-to-end approach yields better results, sometimes even without the refinement step. We evaluate our model on problems from the SRBench benchmark and show that our model approaches the performance of state-of-the-art genetic programming with several orders of magnitude faster inference.
翻译:符号回归, 即预测从观察其值得出的函数的数学表达方式的任务, 是一项困难的任务, 通常需要两步程序: 预测表达的“ skeleton”, 直至选择数字常数, 然后通过优化非convex 损失函数来匹配常数。 主导的方法是基因编程, 通过迭代这个子路程, 大量时间来使候选人进化。 神经网络最近被赋予一项任务, 在一个尝试中预测正确的骨架, 但仍然没有那么强大。 在本文中, 我们质疑这个两步程序, 并责成一个变异器直接预测完整的数学表达, 包括常数 。 之后, 一个人可以将其输入到非convex 优化器作为知情的初始化程序, 从而改进预测的常数 。 我们提出一个推算, 以显示这种端对端方法产生更好的效果, 有时甚至没有精细的步骤。 我们从SRBench基准中评估了我们的模型, 并显示我们的模型以几级速度快速的推导力来进行状态的基因编程的运行。