Symbolic regression is the task of identifying a mathematical expression that best fits a provided dataset of input and output values. Due to the richness of the space of mathematical expressions, symbolic regression is generally a challenging problem. While conventional approaches based on genetic evolution algorithms have been used for decades, deep learning-based methods are relatively new and an active research area. In this work, we present SymbolicGPT, a novel transformer-based language model for symbolic regression. This model exploits the advantages of probabilistic language models like GPT, including strength in performance and flexibility. Through comprehensive experiments, we show that our model performs strongly compared to competing models with respect to the accuracy, running time, and data efficiency.
翻译:符号回归是确定最适合投入和产出值数据集的数学表达式的任务。由于数学表达式空间的丰富,象征性回归通常是一个具有挑战性的问题。虽然基于遗传进化算法的传统方法已经使用了几十年,但深层次的学习方法相对较新,是一个活跃的研究领域。在这项工作中,我们介绍了一个新型的以变压器为基础的语言模型,用于符号回归。这一模型利用了GPT等概率语言模型的优势,包括性能和灵活性。通过全面实验,我们展示了我们的模型在准确性、运行时间和数据效率方面与相互竞争的模型相比表现得力。