Many real-world problems can be naturally described by mathematical formulas. The task of finding formulas from a set of observed inputs and outputs is called symbolic regression. Recently, neural networks have been applied to symbolic regression, among which the transformer-based ones seem to be the most promising. After training the transformer on a large number of formulas (in the order of days), the actual inference, i.e., finding a formula for new, unseen data, is very fast (in the order of seconds). This is considerably faster than state-of-the-art evolutionary methods. The main drawback of transformers is that they generate formulas without numerical constants, which have to be optimized separately, so yielding suboptimal results. We propose a transformer-based approach called SymFormer, which predicts the formula by outputting the individual symbols and the corresponding constants simultaneously. This leads to better performance in terms of fitting the available data. In addition, the constants provided by SymFormer serve as a good starting point for subsequent tuning via gradient descent to further improve the performance. We show on a set of benchmarks that SymFormer outperforms two state-of-the-art methods while having faster inference.
翻译:数学公式自然可以描述许多真实世界的问题。 从一组观察到的投入和产出中找到公式的任务称为象征性回归。 最近,神经网络被应用到象征性回归,其中以变压器为基础的网络似乎最有希望。 在对变压器进行大量公式(按天顺序排列)的培训之后,实际推论,即寻找新的、看不见的数据的公式,非常快(按秒顺序排列),这比最新进化方法要快得多。变压器的主要缺点是它们生成公式,没有数字常数,必须分别优化,从而产生亚优结果。我们建议采用一个变压器法,即SymFormer,同时输出单个符号和相应的常数来预测公式。这导致在调整现有数据方面业绩的更好。此外,SymFormer提供的常数是随后通过梯度下降调整以进一步改进性能的良好起点。我们展示了Symormer-formas的一套快速基准,同时展示了Symformas。